High performance computing helps save the NZ stick insect
This case study originally appeared on Landcare Research's Informatics blog. Republished here with permission.
With the increasing use of next generation DNA sequencing technologies as a regular part of biological research, vast amounts of DNA sequence data are being generated at an ever increasing rate. But with this added capability and power comes technical challenges around data management, processing and analysis. Genomics work carried out in Landcare is no exception when it comes to encountering such problems. An example of these challenges is found is some recent work led by Landcare Research scientist Thomas Buckley and carried out by post-doctoral researcher Alice Dennis and PhD student Luke Dunning.
Alice and Luke are interested in the functional coding regions of genomes of stick insect species native to New Zealand. Until recently processing their insect DNA sequence data collected from the next generation sequencing-by-synthesis platforms took one whole week per individual; even when using a fast multi-core desktop Linux machine with plenty of RAM.
The length of time to process the sequence data significantly slows down their research program and limits the number of such processing steps that can be undertaken within any given project. To improve on this problem Dan White (Informatics team) worked with Alice and Luke to use the computing resources within the National e-Science Infrastructure (NeSI) to shift their memory absorbing processes to NeSI’s high performance computing (HPC) resources which Landcare Research has access to as a member of NeSI. In comparison to the resources available at Landcare Research, the NeSI HPC resources that are now available include an 80 processor computer with each processor having 12 cores and 96 GB of RAM and access to 200TB of disk storage. For the stick insect case above early tests have shown a reduction in processing time from one week to just over 3 hours; a dramatic and significant improvement! Further improvements are expected as we explore the options of processing multiple files simultaneously and by chaining multi step sequence analyses together in an automated process.