Decoding the bovine genome

"Prompt responses from the NeSI team to our requests helped us solve our problems in a few minutes, which in turn allowed us to focus on scientific research rather than on dealing with technical issues."

Maksim Struchalin and fellow researchers at LIC, Livestock Improvement Corporation, have been conducting a world-leading PGP-funded study on sequencing the genome of dairy cattle which will be of enormous benefit to the dairy industry, the national economy and will also help researchers better understand human genetic disorders.

The main aim of the project is to discover new regions in the bovine genome that are responsible for important dairy traits such as milk volume, protein and fat concentration in milk, amongst others. Knowledge of such genomic regions will allow for more accurate selection of bulls with desirable genes, which can then be passed on to their female offspring. The cows with those genes are more likely to produce milk of higher quality and/or greater quantity, thus increasing the potential of generating income for the dairy industry and boosting the New Zealand economy.

The bovine genome consists of three billion elementary building blocks called nucleotides and about one percent of them vary amongst the population. Those varying nucleotides are called Single Nucleotide Polymorphisms (SNPs).  SNPs are known to bear partial responsibility for phenotypic differences between individual animals – what makes animals different from each other. The researchers aimed to find the SNPs that are responsible for bovine performance.

LIC researchers genotyped (measured) approximately 17 million SNPs in more than 100,000 cows and bulls. Thus, the data set is represented by a matrix of about 100,000 rows and 17,000,000 columns. Because many of the animals are closely related, there is nonzero covariance between rows (individuals) of this matrix – this reflects the fact that relatives share some portion of genetic material. Likewise, there is nonzero covariance between columns (SNPs), reflecting the presence of so called linkage disequilibrium between SNPs and the presence of genetic strata in the sample, i.e. clusters of genetically similar animals.

Such complex data demands special care. A few truly associated SNPs might result in millions of observable correlations between SNPs and a trait of interest. The vast majority of correlations would therefore be false positive associations reflecting how these animals were sampled rather than an underlying, biologically relevant genetic model. The main difficulty therefore lies in selecting the small number of true positive associations from amongst millions of false positive ones.

“The NeSI cluster provided priceless resources that allowed us to work with such complex data,” explained Dr Struchalin. “In our analysis, we had to transpose, inverse and multiply large matrices which requires us to use nodes with lots of RAM. Using a large number of computational cores allowed us to run analyses in one week which otherwise would have taken maybe 100 years of computational time. Moreover, NeSI provided us with enough disk space to store countless terabytes of our data and generated outputs.”

“The technical and consulting support provided by NeSI staff was extremely important and, in some cases, crucial for our project. We received a lot of valuable advice on how to accelerate and optimise our computations. Prompt responses from the NeSI team to our requests helped us solve our problems in a few minutes, which in turn allowed us to focus on scientific research rather than on dealing with technical issues.

Undoubtedly, the amount of data and the complexity of analysis will continue to grow in the future. This makes it very important to have such computational resources as NeSI.”

To find how this project will impact research of human genetic disorders, you can read more information here.

The Transforming the Dairy Value Chain Primary Growth Partnership is a seven-year, $170 million research and innovation programme involving the Ministry for Primary Industries and commercial partners, including DairyNZ, Fonterra, LIC and Zespri.

Next Case Study

A visualisation of the data used in Dr Aslanyan's research.

Fast cosmology with machine learning

"Our algorithm is designed to take advantage of parallelism. Running the algorithm on many parallel nodes would have been impossible without the NeSI cluster.”