Using statistical models to help New Zealand prepare for large earthquakes
The below case study shares some of the technical details and outcomes of the scientific and HPC-focused programming support provided to a research project through NeSI’s Consultancy Service.
This service supports projects across a range of domains, with an aim to lift researchers’ productivity, efficiency, and skills in research computing. If you are interested to learn more or apply for Consultancy support, visit our Consultancy Service page.
University of Otago researchers Ting Wang and Amina Shahzadi are investigating ways to build and test statistical models that could one day provide better estimation of when a large earthquake could strike. This particular research was a part of a Marsden Fund Fast-Start project entitled "Developing Inversion Methods for Non-stationary Thinning of Point Processes".
Hazard estimation from earthquake and volcanic activity remains a formidable challenge, not least because the observed signals often have incomplete records in time. Consequently, hazard estimation from time-inhomogeneous incomplete records is complicated and potentially biased.
As part of her PhD project, Amina proposed different types of models for dealing with incompletely observed point processes. Her code is written in the programming language R, which is great for statistical analyses and rapid prototyping. However, that flexibility comes at a price; R can be a little slow compared to compiled languages such as Fortran and C++. The code takes 10 minutes to run for 500 observations but would take days to run for 10,000 observations.
For real data analysis, Amina needed to use at least 10,000 combinations of initial values for each data set to find the global maximum of a likelihood function, which would have taken longer than her studies to complete. Clearly, there was a need to make her code run faster.
What was done
NeSI helped move the researcher to source version control and profiled the code to identify the performance bottlenecks. Two functions were quickly identified as taking most of the execution time.
NeSI and the researcher re-implemented these two functions in C++ using Rcpp Armadillo, a technology that allows the programmer to embed C++ within R without incurring a high maintenance cost.
OpenMP directives were added to the numerically most intensive loops
Performance improvements of 3-20x were obtained for the supplied test cases thanks to OpenMP parallelisation and other code improvements. Amina reported a reduction of wall-clock time from 7-8 days to 7-8 hours using 8 threads, a 20x speedup for a problem involving 6000 observations.
"The help that we received from this NeSI consultancy project was crucial to our research project. It enabled Amina to finish her PhD on time and speed up our research progress. The NeSI team looked at our programs step by step and guided us on how we could change the R and C++ code and run these using OpenMP. The programs taking 7-8 days to run were made to produce the results within 7-8 hours. The NeSI support made it possible for us to complete this research within a limited time frame."
- Ting Wang, Senior Lecturer, Mathematics & Statistics, University of Otago
Do you have an research project that could benefit from working with NeSI research software engineers? Learn more about what kind of support they can offer and get in touch by emailing firstname.lastname@example.org.