OpenACC for tropical circulation model
The below case study shares some of the technical details and outcomes of the scientific and HPC-focused programming support provided to a research project through NeSI’s Consultancy Service.
This service supports projects across a range of domains, with an aim to lift researchers’ productivity, efficiency, and skills in research computing. If you are interested to learn more or apply for Consultancy support, visit our Consultancy Service page.
University of Auckland researcher Dr Gilles Bellon has developed a quasi-equilibrium model (QTCM2) to describe climate and rainfall in tropical regions. The problem involves solving 2D, time dependent fluid equations on a beta plane. Of interest is the steady-state solution, which takes 5 million time steps to reach and 10,000 CPU hours to execute.
Different versions of the model with differing physical approximations need to be tested and sometimes many runs with different initial conditions need to be submitted. In addition, the model currently runs at 24km resolution, ideally it would need to run at 10km resolution. This makes the model exorbitant to run, so Gilles called on NeSI to see if we could help him reduce the wall clock time.
NeSI’s Research Software Engineers had previously worked with previously worked with Gilles to achieve a 20X speedup compared to the original version of the code. A new opportunity arose when new A100 GPUs were installed on NeSI. It raised a question: Would offloading some of the computations further improve the performance of the code?
What was done
NeSI Research Software Engineers Alex Pletzer and Chris Scott modified the source code to tell the compiler to offload loops to run on the GPU device using OpenACC directives, and leveraged the cuBLAS library to solve the Poisson equation.
A 31X (66X) speedup on P100 (A100) GPU compared to the single threaded executable. While OpenMP can accelerate the code on a CPU, there is still a 16X (35X) performance advantage of running on the GPU compared to using 16 threads. Moreover, the cost of computing climate and rainfall is always cheaper on the GPU compared to any number of CPUs.
The final results mean Gilles is now able to crunch more numbers faster and cheaper.
Do you want to bring your research to the next level? Send an email to firstname.lastname@example.org.