Enhance national service delivery consistency and performance to position NeSI for growth

As an enabler of research across a wide range of communities and disciplines, NeSI is well-connected and well-positioned to support, collaborate with, and grow alongside New Zealand’s research sector.

Improved reporting and monitoring have been key focuses in 2019, to position NeSI for growth as well as ensure national service delivery was reliable, consistent, and high performing.

Monitoring and collecting metrics

As part of efforts to improve NeSI’s ability to track and collect platform performance metrics, a Platform Monitoring workshop was conducted with a focus on basic machine and service availability monitoring, as well as centralised log collection and processing. Since then, a regular Cray Monthly Systems Performance Report is now being used to better monitor and act upon capacity opportunities and to ensure the systems continue to run at their most efficient way.

Further investigation is required to determine appropriate solutions for metrics collection (both real-time trends and longitudinal) across platforms and services variables. Other reports are also being prototyped on a range of resources (CPU, Memory, GPU queue partitions) including utilisation and efficiency, in order to better represent the value NeSI provides to the broader research sector.
 

Attribution: 

Consistent and reliable availability

During 2019, platform uptime remained very high with a few small outages related to individual compute node failures.

This consistent uptime was able to support a rapid growth in usage during August 2019, when the NeSI platform experienced almost 100% capacity used across both Mahuika and Māui. High ongoing usage is expected to continue, particularly as the number of users continues to grow.

Objectives