Researcher Dr Gilles Bellon is using NeSI computing power to improve a model to describe climate and rainfall in tropical regions. Stock image source: Luda Kot, Pixabay.

OpenACC for tropical circulation model

When new A100 GPU resources were brought online at NeSI, it raised an interesting question for Gilles Bellon: Would offloading computations onto GPUs further improve the performance of his code?
The below case study shares some of the technical details and outcomes of the scientific and HPC-focused programming support provided to a research project through NeSI’s Consultancy Service.
This service supports projects across a range of domains, with an aim to lift researchers’ productivity, efficiency, and skills in research computing. If you are interested to learn more or apply for Consultancy support, visit our Consultancy Service page.


Research background

University of Auckland researcher Dr Gilles Bellon has developed a quasi-equilibrium model (QTCM2) to describe climate and rainfall in tropical regions. The problem involves solving 2D, time dependent fluid equations on a beta plane. Of interest is the steady-state solution, which takes 5 million time steps to reach and 10,000 CPU hours to execute.

Different versions of the model with differing physical approximations need to be tested and sometimes many runs with different initial conditions need to be submitted. In addition, the model currently runs at 24km resolution, ideally it would need to run at 10km resolution. This makes the model exorbitant to run, so Gilles called on NeSI to see if we could help him reduce the wall clock time.


Project challenges

NeSI’s Research Software Engineers had previously worked with previously worked with Gilles to achieve a 20X speedup compared to the original version of the code. A new opportunity arose when new A100 GPUs were installed on NeSI. It raised a question: Would offloading some of the computations further improve the performance of the code?


What was done

NeSI Research Software Engineers Alex Pletzer and Chris Scott modified the source code to tell the compiler to offload loops to run on the GPU device using OpenACC directives, and leveraged the cuBLAS library to solve the Poisson equation. 


Main outcomes 

A 31X (66X) speedup on P100 (A100) GPU compared to the single threaded executable. While OpenMP can accelerate the code on a CPU, there is still a 16X (35X) performance advantage of running on the GPU compared to using 16 threads. Moreover, the cost of computing climate and rainfall is always cheaper on the GPU compared to any number of CPUs. 

The final results mean Gilles is now able to crunch more numbers faster and cheaper. 

A chart showing the speedup of the code.

Do you want to bring your research to the next level? Send an email to


Next Case Study

Image of a black hole. Stock image by Alexander Antropov from Pixabay.

Simulating black holes

"Alex and his team quickly pointed out optimisations that could be done to enhance the code further. The result is a more robust codebase that runs faster than before."