Cellulose unchained

“NeSI is an amazing resource for the New Zealand science community. Not only is our dream to unchain the structure of cellulose within reach, it looks like we will be able to achieve it more smartly and quickly than ever.”

Scion scientists can now generate hundreds of thousands of model cellulose molecules and their X-ray diffraction patterns to help reveal the structure of the most common organic polymer on the planet, thanks to NeSI’s support and parallel processing opportunities.

Dr Stefan Hill is Research Leader for advanced chemical characterisation at Scion, the Crown Research Institute for forestry, wood and bioproducts, based in Rotorua. He is intensely interested in the structure of cellulose.

“Cellulose is the most ubiquitous and abundant polymer by weight on the entire planet, but its structure, in particular the number of chains that make up the material, is still unclear. If we can understand how cellulose is put together then we can better understand how to take it apart or modify it to take advantage of its amazing properties.”

Direct interpretation of the complex X-ray diffraction patterns generated by cellulose is currently limited and based on a number of assumptions. The Scion team is taking an alternative approach of creating model diffraction patterns and comparing with them with experimental data.

“It’s a simple idea,” says Stefan, “but first we need vast numbers of input datasets, then we need to generate model X-ray diffraction patterns, and finally compare the model diffraction pattern with actual X-ray diffractograms.”

The late Dr Roger Newman started working on this problem in the 1980s. He realised that Bragg’s Law, which is normally used to predict diffraction patterns and assumes crystals are infinite in all directions, did not hold for long, nano-fibre-like crystals like cellulose. Dr Newman’s solution was to write his own software to process datasets hand-entered into Excel spreadsheets. Generating one diffraction pattern took a day or more.

Stefan updated the program in 2015 and reduced running time to minutes, but was still left with the need to generate millions of input cellulose crystal structures.

“Coincidentally, around this time, I attended a presentation about NeSI’s capabilities given by Ben Roberts. The access to supercomputing, the ability to run the model software in parallel on a number of processors seemed to be the perfect way to do one thing many, many times,” says Stefan. “Ben was enthusiastic and encouraged us to have a go.”

The team at NeSI modified Scion’s software slightly and ran it successfully on NeSI’s FitzRoy platform at NIWA, showing it was definitely possible to create vast numbers of model cellulose X-ray diffraction patterns in a short time.

The initial proof of concept used a limited number of datasets. A Monte Carlo approach that creates many datasets in a short time has replaced the old Excel method and now the Scion team is planning a run of tens or hundreds of thousands of structures, probably in the second half of 2016.

The larger test is the subject of a proposal for a development project undertaken by Stefan, with Ben taking care of many of administrative actions.

Stefan expects the larger scale testing will allow the team to get a handle on the final part of unravelling cellulose’s structure – comparing model diffraction patterns with experimental data.

“Cellulose crystals do not exist alone. They are part of a matrix of hemicellulose, lignin and other components that make up trees and plants. This produces a background that complicates comparing models and actual diffraction patterns.”

Removing the background from experimental data, or adding components of it to model diffraction patterns (the approach that Stefan favours) followed by diffraction pattern comparisons, will also be a task for supercomputing.  

NeSI’s expertise and computing power have also opened up the possibilities of using new strategies to tackle the problem. One promising-looking approach is the use of machine learning to remove humans (and their bias) from the generation of input datasets. With machine learning, the machine/computer generates datasets, compares model diffraction patterns with actual diffraction pattern and ‘learns’ which datasets are a good fit and which are not.  The ‘good’ are then ‘bred’ and mutated to build up model structures that come closer and closer to resembling the actual structure.

“NeSI is an amazing resource for the New Zealand science community,” says Stefan, “Not only is our dream to unchain the structure of cellulose within reach, it looks like we will be able to achieve it more smartly and quickly than ever.”

Next Case Study

Molecular fingerprinting

"Access to the increased computing capacity of the NeSI supercomputers facilitated Dr Sibaev's research by speeding up the rate at which he was able to generate the data required for his project."