Predicting precipitation using machine learning
The below case study shares some of the technical details and outcomes of the scientific and HPC-focused programming support provided to a research project through NeSI’s Consultancy Service.
This service supports projects across a range of domains, with an aim to lift researchers’ productivity, efficiency, and skills in research computing. If you are interested to learn more or apply for Consultancy support, visit our Consultancy Service page.
Dr. Abha Sood is a NIWA climate scientist involved in a number of regional modelling and New Zealand climate projects. Recently, she has been investigating how different preprocessing methods and modelscan can improve the predictability of rain forecast at Whenuapai airport.
Downscaling global climate model (GCM) data from 150km resolution to 12km or smaller currently requires running a computationally expensive regional climate model (RCM) using GCM forcing data at the lateral and surface boundaries. In addition, RCM output data (temperature, precipitation, etc) are biased with respect to in-situ observations due to various RCM sub-grid scale effects that are not adequately represented. This project aims to explore different machine learning approaches to determine to what extent precipitation can be predicted using a small set of climate indices, which represent an averaged weather pattern.
What was done
An infrastructure that allows the user to evaluate different preprocessing and model setups:
A script to preprocess the raw data. The script aligns row by row weather index data from different months to enhance the data space. The data can be smoothed by taking a rolling mean. The seasonal mean can be subtracted to predict anomalies.
A script to build, train models and apply models for prediction. There are many models to choose from, from simple linear regression to dense and convolutional neural networks. The infrastructure makes it easy to add a new model.
A Snakefile will apply different machine learning models and compare their performance
Various jupyter notebooks show how to call preprocessing, model training and model prediction functions
Documentation showing how to extend the framework
The best model is shown to perform significantly better than a standard linear regression, being able in particular to anticipate months with large anomalous rainfall.
A machine learning infrastructure that allows researchers to quickly compare the efficacy of a number of data preprocessing approaches and machine learning algorithms. A regularised multi-layer perceptron neural network trained on past 96 months weather indices anomalies was shown to perform significantly better than a linear regression model at Whenuapai Airport.
The above figure compares the predicted (red) with the observed (blue) rainfall anomalies in cm month by month for Whenuapai Airport. The multi-layer perceptron model is much better at capturing the month to month variability than a linear regression model.
"NeSI’s technical and computation support and expertise was pivotal in getting cutting edge machine (deep) learning algorithms implemented and streamlined processes for efficiently achieving research targets. The overall satisfaction has been excellent! NeSI experts introduced new technical ideas and tools which we will use and implement elsewhere."
- Dr Abha Sood, Climate Scientist, NIWA