Automating workflows to help scientists address crucial carbon cycle questions
The below case study shares some of the technical details and outcomes of the scientific and HPC-focused programming support provided to a research project through NeSI’s Consultancy Service.
This service supports projects across a range of domains, with an aim to lift researchers’ productivity, efficiency, and skills in research computing. If you are interested to learn more or apply for Consultancy support, visit our Consultancy Service page.
The CarbonWatchNZ inversion system is a world leading method for national scale top-down carbon accounting. The overall goal of the CarbonWatchNZ project is to constrain carbon greenhouse gas (CO2 and CH4) fluxes on a national, regional and urban scale in New Zealand. This is done through an atmospheric inverse modelling system that uses atmospheric measurements of greenhouse gases to infer their emissions and uptake. The resulting fluxes provide an important addition to the reported fluxes in New Zealand's National Inventory.
The inversion system uses output from a Lagrangian dispersion model, Numerical Atmospheric dispersion Modelling Environment (NAME III), driven by meteorological input from the New Zealand Limited Area Model (NZLAM) and New Zealand Convective Scale Model (NZCSM) computed from a numerical weather prediction (NWP) model based on a local configuration of the UK Met Office Unified Model, all running on NeSI’s High Performance Computing platform. In addition to the national scale modelling, a number of smaller scale, regional simulations are also performed on a higher spatial resolution to better understand the carbon exchange from different environments
Pictured above is a NAME III model output for different sites across New Zealand. Image provided by Dr Beata Bukosa, NIWA.
Input data is large and is archived, i.e. not readily accessible to compute, and must be converted to a different format required by the NAME III simulation code
Retrieving archived data can be tedious as it can take a long time and often fails, requiring many manual restarts before eventually completing successfully
Large quantities of output data are generated and need to be stored in an efficient manner
The Cylc workflow management tool was already being used but still required some manual input to run a simulation, i.e. is not fully automated
What was done
NeSI Research Software Engineer Chris Scott and Application Support Specialist Anthony Shaw worked with NIWA scientists Beata Bukosa, Stijn Naus and Daemon Kennett to develop and improve their Cylc workflow for running the NAME III model.
Initially the workflow was upgraded to run with the latest major release of Cylc (v8). Following which, some additional tasks were added to the workflow to automate the retrieval of input data from the storage archive and convert it to the required input format for NAME III.
Files that had already been retrieved, e.g. from a recent run, were skipped and the retrieval task was made to be robust to failures, i.e. it would automatically re-run the retrieval on failure without requiring manual intervention. Other improvements included extracting parameters out of scripts and defining them in a single place, making it much easier to configure workflow runs.
In addition to the changes related to the workflow, a comparison between different compression algorithms, such as lz4, gzip, bzip2 and xz, for compressing output data was carried out to find the best trade-off between file size and decompression time (which is important for post-processing), while maintaining compatibility with post-processing scripts that need to load the output data.
The main source of the input data is the NIWA data archive, which was used in the new retrieval tasks mentioned above. However, some already-converted input data also exists on the NeSI Nearline archive. A proof of concept task was added to the workflow showing how data could be first retrieved from Nearline, if it exists there, and otherwise retrieved from the NIWA data archive. This task requires further development to make it more robust and would benefit from improvements to the Nearline system.
Fully automated Cylc workflow that can retrieve required data from storage archives, convert it to the correct format and run daily simulation across a range of dates
Configuration options for the workflow pulled out to a single file making it easier to set up workflow runs
Options for compressing output data were evaluated to ensure output data is stored efficiently
"I am pleased to share our positive feedback on the consultancy for optimizing our HPCF and CYLC workflow. The transformation of our CYLC setup into a fully automated and more flexible workflow has had a remarkable impact on our research processes. We now experience reduced time to solution, allowing us to achieve our research goals more efficiently and effectively.
This enhancement has a direct and positive influence on our productivity and outcomes. The consultancy has not only optimized our current workflow but has also laid the foundation for sustainable software practices, ensuring that it remains efficient and up-to-date in the long term. The expertise and knowledge shared by Chris and Anthony have been invaluable. Moreover, our team has upskilled and gained a deeper understanding of the HPCF.
The consultancy was completed on time and with great efficiency, and the NeSI team provided feedback and solutions to all additional questions that arose during the consultancy. The positive impact on our research capabilities and operational efficiency is evident, and we are excited about the potential for even greater achievements in the future. We are highly appreciative of the expertise and effort put into this project consultancy."
- Dr Beata Bukosa, Atmospheric Modeller, NIWA
Do you want to bring your research to the next level? We can help. Send an email to email@example.com to learn more about our Consultancy support.