Squeezing performance from community code SPECFEM3D
Imagine you buy a new car which you take for a test drive. The speedometer reaches 100 km/h and you discover a switch, hidden somewhere, that suddenly boosts the vehicle’s speed to 150 km/h. This scenario often happens in the realm of computing where, thanks to small tweaks, the performance can be greatly improved without adverse side-effects.
Dr. Yoshihiro Kaneko is a seismologist at GNS Science who runs computer simulations on NeSI’s platform to compute seismic waves involved in earthquake events, such as the damaging MW 7.8 Kaikoura quake in 2016.
His bread and butter code is the Galerkin spectral finite element code SPECFEM3D, which he ported from the now decommissioned FitzRoy system to NeSI's Cray XC50 supercomputer.
Running SPECFEM3D as fast as possible is important to Dr. Kaneko because it enables him to achieve higher productivity, a larger research output, more papers published, and ultimately a better understanding of how earthquakes are triggered and propagate.
There are many ways an application’s performance can be improved. First and foremost, the execution time depends on how a “compiler” translates the program into a set of instructions that can be understood and executed by the computer. Just like each human will express the same concept in slightly different ways, each compiler will map a program more or less efficiently.
There are multiple compilers and each compiler takes a myriad of different options, only a subset of which impact performance and not necessarily in a positive way. In addition, the NeSI's Cray XC50 supercomputer has the latest of Intel Skylake processors, which support larger vectorization pipelines (AVX-512). This means more operations can be performed simultaneously.
With the help of Alex Pletzer, HPC software engineer at NeSI, Dr. Kaneko was able to identify the compiler and target that produce the best outcome. The graph below shows a 50% speedup on the Cray XC50 supercomputer after switching from the GNU compiler 4.9.3 to the Cray compiler and applying Skylake processor optimizations.
A benchmark simulation ran about 2.5 times faster on the Cray XC50 supercomputer compared to running on FitzRoy. While the Cray compiler was found in this case to produce the fastest code, some software will perform better with either the GNU or Intel compilers. Users on Cray’s supercomputers are fortunate to have a choice of compilers, which provides them with more opportunities for higher execution efficiency.
“The computing power available through NeSI is essential for my research,” said Dr. Kaneko. “I appreciate the NeSI staff’s assistance in helping me move to NeSI's Cray XC50 supercomputer and achieving this significant boost in productivity. I’m looking forward to continuing my projects using this powerful system.”
Further details on the speedup work performed for Dr. Kaneko are available on Github: https://github.com/pletzer/perf_kupe/blob/master/specfem3d/specfem3d-tuning.md
Interested in learning how NeSI’s Consultancy Team can help you? Or have an example of how NeSI platforms have supported your work? Get in touch by emailing email@example.com.