High performance computing’s role in biosecurity
Access to a high performance computing (HPC) cluster is enabling a researcher to identify invasive pests faster. Using NeSI’s HPC infrastructure, provisional results can be available to the researcher days or even weeks before they would be if the data were to be analysed using overseas resources.
Laura Boykin is a Postdoctoral Fellow at Lincoln University’s Bio-Protection Research Centre. Her work involves bioinformatics research into highly invasive insects, including Bemisia tabaci (Silverleaf whitefly). In 2007 Boykin and her collaborators in Australia and China began studying the pervasive whitefly, found everywhere except Antarctica. Although most species have similar physical characteristics they behave differently. Some transmit viruses and others quickly become insecticide-resistant. Governments are interested in minimising the species of whitefly that transmit viruses because, once introduced, they’re capable of doing billions of dollars-worth of damage to crops.
Boykin received her PhD at The University of New Mexico, her MA at San Francisco State University and her BA at Occidental College. She previously worked as a research associate at Los Alamos National Laboratory with the Theoretical Biology and Biophysics Group and was involved in the Influenza Sequence Database and Hepatitis C Database.
The analytical process she describes in a paper co-authored with Karen F. Armstrong, Laura Kubatko and Paul De Barro, Species Delimitation and Global Biosecurity (doi: 10.4137/EBO.S8532), has led to the identification of new Bemisia tabaci species on two separate occasions. The study tested a series of analytical options and determined their applicability as tools for identifying and defining species for biosecurity purposes.
Dr. Boykin providing a video abstract of her recent research
“We didn’t have names for them, they were all just called Bemisia tabaci,” says Boykin. “The government didn’t want Bemisia tabaci here at all, but some of these whiteflies don’t transmit viruses and won’t cause a problem.”
Inconsistent taxonomy can lead to confusion over the species that represent a biosecurity threat. But there are strict regulations defining what may be described as separate species. “We decided to gather all the genetic data that was out there,” says Boykin. “Since they looked the same, we decided to check the genetic differences and came to find that there are lots of genetic differences between the different species found around the world.”
Any foodstuffs shipped to New Zealand and held at the port of entry must ideally be checked for invasive pests within 24 hours to avoid shipments spoiling. Previously any organisms found had to undergo rigorous statistical testing to identify them. “They would get some sequence data and say, ‘We think it’s Bemisia, keep it out’,” says Boykin. “We’ve done this with some Australian samples. My Australian colleague sent me the sequence data and asked if I could tell him what it was. I logged in, put it into NeSI, and 24 hours later I had the results and could tell him with confidence from these statistical tests what it was.”
This use of HPC on large data sets differentiates Boykin from other biologists. She doesn’t consider herself an expert in computing but neither does she regard herself as a traditional biologist. “I’m able to see both sides of it and speak to the computer scientists and the biologists. I’m really interested in how things change over time, the evolutionary process, and using phylogenetic tools to see if you can predict the next outbreak.”
Multiple lines of evidence are required by regulators in order for a species to be identified, but Boykin and her colleagues’ technique allows people who aren’t necessarily experts to identify species. “NeSI does it quickly and we can get the results to MAF or its Australian equivalent with 95 percent confidence.” She and her colleagues have developed their own metrics to help decide how to name a species using statistical analysis. “That’s why we needed NeSI because these calculations take a lot of computational power. It hasn’t been done on large data sets or invasive pests before.”
NeSI Auckland cluster
The University of Auckland’s HPC cluster is a significant component in the National eScience infrastructure project (NeSI). It consists of 80 compute nodes, each with 2×Intel Xeon X5660 1.6 GHz 6 core processors giving 12 cores per node, and 100 GB RAM. The nodes are connected using Infiniband, a switched fabric communications link used in HPC and enterprise data centres. The total cluster has 960 cores and 8 TB of RAM. The machine runs the 64 bit version of the open source operating system Red Hat 5.6 and its capacity is being continuously expanded.
Lincoln University’s Bio-Protection Research Centre specialises in agriculture but doesn’t have its own HPC cluster. When Boykin arrived there she googled ‘supercomputers in New Zealand’ which led her to Vladimir Mencl at BlueFern, the supercomputing service facility based at the University of Canterbury, which facilitates research requiring access to HPC resources. It operates in a collaborative partnership with NeSI’s nationwide supercomputer network. Mencl was instrumental in implementing the software plug-in gsi, which was written on the customisable Geneious platform for sequence analysis.
Boykin and her colleagues ran Bayesian analyses using MrBAYES on BeSTGRID and the BlueFern Supercomputer at the University of Canterbury. BeSTGRID (Broadband-enabled Science and Technology GRID), a fabric connecting leading universities and organisations and enabling faster collaborative research with greater computational power and data intensity, was a NeSI forerunner.
At the Los Alamos National Laboratory, New Mexico, Boykin was granted access to some of the world’s best HPC facilities. However, she says the degree of personal interaction with the NeSI team has made her work in New Zealand more satisfying. “The NeSI team is approachable and flexible, and that’s huge – especially for non-computer scientists. It’s intimidating when you have these machines and you don’t know exactly what you’re doing with them. You typically don’t have access to the people who are going to get the job done.”
Boykin says access to NeSI enables researchers at smaller institutions such as Lincoln to access people with the HPC expertise to help them with their research. Bioinformatics is likely to play an essential future role in the work of New Zealand biologists, ecologists, entomologists and botanists, as their work comes to hinge on genomic data analysis. “In the past five to seven years we’ve had this revolution of being able to get hold of genome sequences very quickly, and the bottleneck is always the analysis and the storage of the files – they’re huge; it’s beyond terabytes of data.”
Boykin likens her work to a crime scene investigation. “Sometimes you don’t even get the whole insect, just a wing or a leg found it on some fruit or an ornamental plant and we need to identify it.”
Although able to log on to free HPC services in the States, Boykin prefers NeSI because there are no delays. “In the States or the cluster in Oslo there’s always a queue. That’s fine if you’re doing basic exploratory research, but when you’re identifying invasive species you need to have access to the computer nodes right away to get the results in a timely manner. NeSI is days and sometimes even weeks faster.”
Boykin recently returned from five weeks in China where she worked on whiteflies with Chinese Bemisia researchers with access to whole genome data. “They wanted help analysing their data. I contacted the NeSI team and said I wanted to use the cluster to analyse data with my Chinese collaborators.” The NeSI team was able to help Boykin and her colleagues circumnavigate numerous firewall restrictions.
Boykin most enjoys using statistical analysis to solve real world problems. “Other researchers are starting to pick up on this paper we’ve written and use it. I’ve just got a paper to review where they’ve used our methods for analysing snails in Belgium. It’s starting to be used not only for invasive things and, to me, that’s the measure of success – I put it in a paper and somebody applies it to their work. Having a real application for what you do helps to justify it.”