National Platforms Framework: 2015 revision
The National Platforms Framework outlines NeSI’s plan for investment into platform assets and services which meet sector-wide research needs. The 2015 review confirmed the following key aspects of the Framework:
- Consolidation from three to two HPC platforms nationally by mid-2016,
- Replacement of these two platforms along with any platform requirements to meet national Genomics needs by mid-2017 through an integrated procurement process,
- Optimisation across both national platforms to improve operating efficiency through fit for purpose use, including extension through a Cloud-burst model to meet peak demand,
- Extend scope of team expertise to incorporate Data Analytics expertise to support anticipated broadening of researcher’s requirements.
This approach provides an opportunity to deliver an infrastructure which will:
Significantly reduce the barriers to users moving between the Capacity and Capability Platforms, and between Genomics and HPC platforms,
Provide users with a common development and job management environment, as well as advanced HPC, Data Analytics, and Genomics capabilities, and
Reduce the cost of Platforms support services while providing improved resilience and support functions.
The revised National Platforms Framework has been informed by consultations with NeSI’s Collaborators and Subscribers, and with NZGL, and analyses of:
Current NeSI Platform utilisation,
The expected research needs of those users who responded to a research needs survey, and,
Anticipated HPC technology trends and roadmaps,
The revised National Platforms Framework will support all six NeSI Objectives:
Support New Zealand’s research priorities.
Grow advanced skills that can apply high-tech capabilities to challenging research questions.
Increase fit-for-purpose use of national research infrastructure.
Make fit-for-purpose investments aligned with sector needs.
Enhance national service delivery consistency and performance to position NeSI for growth.
Realise financial contributions and revenue targets to enhance NeSI’s sustainability
Approval by NeSI Board
The 2015 review process was completed with the NeSI Board approving the 2015 revision at their meeting of March 14, 2016.
National Platforms Framework (2015 revision)
The National Platforms Framework 2015 revisions is as follows (Table 1):
Table 1: 2015 Revision of the National Platforms Framework.
National Platforms Framework (2015 revision)
1) Use capacity planning to determine requirements for operational Cloud-burst service by 30 June 2016
2) Decommission University of Canterbury Platforms by 30 June 2016
3) Optimize and sustain fit for purpose use of the existing infrastructure
4) Recruit a Data Analytics expert by 30 June 2016
5) Agree Data services strategy and feed into platform design by 30 June 2016
6) Design platform solutions that will enable NeSI to meet its Goals and Objectives, and develop Requests for Proposals for both Capacity and Capability Systems by 15 July 2016
7) Issue Request for Proposals by 30 July 2016:
a. Initial responses due 15 October 2016
b. Best and final offers due 15 November 2016
c. Select successful vendor(s) by 30 December 2016
1) Decommission Pan and FitzRoy by 30 July 2017
2) Contracting, installation, acceptance testing, in production by 30 June 2017 with user training in July 2017
3) Optimize and ensure fit-for-purpose use of the new Platforms
4) Optimize Data services
1) Optimize and sustain fit for purpose use of the existing infrastructure
2) Review platform investments to inform future investment plans
Review of Current Platforms and Usage
The primary purpose of HPC is to achieve “shortest time to solution”. To this end, NeSI invested in, and operates two classes of HPC Platforms: Capacity (or “High Throughput Computers”) and Capability (or “Supercomputers”), whose features and application domain characteristics are described in Table 2 below and here.
Table 2: General features of Capability and Capacity HPC systems and of the characteristics of research applications that are executed on them.
Features and Application Domains
(NIWA P575/ P6
Need high level of Reliability, Availability and Serviceability (RAS) – if being used to support decision making in advance of / during national emergencies, high availability SLA.
Fit for purpose use of the installed NeSI HPC platforms has focused on application domains which make most efficient use of the key features of each architecture as specified in Table 1. In particular:
- Pan – nearly all jobs run on one node or less, or are loosely coupled (i.e. Embarrassingly Parallel) and there is high demand for throughput – hence the primary investment in Pan has been on adding more and more processors.
- FitzRoy – nearly all jobs are tightly coupled, run on multiple nodes, achieve good scalability and have very large I/O demands. It is widely used for research that demand high performance processors and interconnect.
- Foster – as a more specialised platform, it has slow processors with a (relatively) high performance interconnect, making it suitable for application to a number of problem classes that scale well on this architecture. It cannot efficiently execute “single” core jobs.
General utilisation of the Platforms follows naturally – with the Capacity systems executing large numbers of small jobs from a large user base ensuring relatively high levels of utilisation, while Capability systems execute a small number of large to very large jobs from a much smaller user base.
Accordingly, Capability system utilisation is much more susceptible to changes in research activity (e.g. it may drop after a long series of integrations comes to an end, and the research team moves to data analysis, and paper writing activities, or a PhD student comes to the end of their research, before the next student ramps up activity.)
Research Needs Survey Analysis
Summary of survey responses by research domain
Key researchers completed a “Research Needs Survey”, with 46 responses received. Table 3 classifies these responses by primary research domain (in a number of cases, the respondents indicated more than one domain), and by the primary platform used by the respondents.
Table 3: Response to the survey by research domain, and by platform used (grouped by science domain).
Primary Research Domain
Not all respondents are NeSI users
Cellular, Molecular and Physiological Biology
Earth Sciences and Astronomy
Ecology, Evolution and Behaviour
Economics and Human and Behavioural Sciences
Engineering and Interdisciplinary Sciences
Mathematical and Information Sciences
Physics, Chemistry and Biochemistry
Key insights from the survey
The survey responses covered in Table 3 suggest that:
NeSI’s Capacity platform Pan is operating as planned by supporting research across a wide range of research domains. Further analysis of the detailed responses and of historic utilisation on Pan indicates the large majority of research jobs meet the application expectations of a Capacity class system as noted in Table 1
There were 46 responses to the Survey from a broad range of research groups. Further end user engagement is anticipated during the National Platforms Framework 2016 review
The (large) majority of respondents have few or no identified international collaborations / peers in their areas of research. This is of interest as it is increasingly complicated and challenging for individual researchers and small research groups to maintain and develop HPC software codes
Implications for the evolution of NeSI’s services
The following points capture key requirements for NeSI’s services which were identified by respondents:
Faster methods to transfer large datasets between research groups, onto the NeSI HPC Platforms, and to / from international peers / collaborators
Improved data share services
Access to, and management of large datasets (i.e. capability to host reference datasets for long periods)
HPC Compute and Analytics
The big Earth Sciences and Astronomy researchers have a well-articulated understanding of future needs, e.g.
For high performance cores (not much use yet for GPGPUs or MIC architectures);
Very large core-hour requirements (O(100M core-hour) per annum in the case one specific community;
High interconnect fabric performance to enable scalability on tightly coupled codes
Large data output and storage (O (1PB) per simulation and the need for multiple simulations
Researchers in Biomedical Sciences will also need access to large Capability Platform resources;
In some science domains – there are major gains to be made by transitioning to codes that can make use of GPGPUs (e.g. Molecular Dynamics codes such as AMBER) – leading to very cost effective HPC services and improved time to solution metrics.
Use of MIC architectures (e.g. the new Knights Landing self-hosting Many Integrated Core architecture) in science codes that deliver performance improvements in time to solution is less clear
HPC Data Analytics:
The need for Data Analytics, and reduced movement of data (I.e. analytics in situ) will be an area of growth in the coming years
In part this will be driven by the needs to analyse PB scale datasets
HPC platform operations:
There is an expressed desire for better access to fit-for-purpose platforms and management of queues / workloads on the HPC platforms.
A number of research groups are using NeSI platforms that are not optimal (i.e. fit for purpose to their needs)
Consultancy and Training
Many researchers believe that they could be more effective if they had dedicated scientific programmer resources to draw on
Internationally, there is substantial effort being applied to the issues facing a number of science domains, e.g.
In genetics research software codes are being developed that remove some of the bottlenecks imposed by current serial codes (typified by users requesting nodes with larger and larger amounts of memory – to be operated on by one core). New Zealand research groups need to begin to adopt these new methods so that they can achieve faster times to solution, and NeSI can make cost effective investments in HPC platforms
In Molecular Dynamics, Materials Science and Computational Chemistry, GPGPU acceleration of some codes is already showing big gains in time to solution as well as better benefit to cost metrics
However, researchers are typically conservative, not prioritising the time to test new methods and approaches to problem solving. NeSI’s Scientific Programmers can assist researchers to make these transitions, which is likely to lead to major benefits for New Zealand science
With the increasing need for Data Analytics in the future (including visualisation), NeSI will invest in recruiting “Data Analytics” expertise
Review of technology directions
While HPC technology is always evolving, a number of potentially disruptive technologies have come into view over the past year, while others that are new now, will soon reach levels of maturity appropriate for a national HPC infrastructure. The detailed analysis focused on:
- Processor architectures;
- Interconnect developments;
- The development of deeper memory hierarchies;
- Storage systems, that can deliver both high IOPS and bandwidth ;
- Parallel Filesystems;
- High density packaging, power efficiency and cooling;
- Software developments; and
- Cloud options.
Taking all these points together, the new technologies that will become available in 2017 make this a good a time to acquire new platforms, since:
- Storage technologies that are available today, but immature, will be much improved;
- Deep memory hierarchies will be available and will have gained some maturity, as well as the software systems needed to utilise them;
- More efficient and performant processors will be available; as well as
- New interconnect technologies.
NeSI’s team reviewed international best practice for HPC procurement, noting best practice recommendations were consistent with the recently completed NeSI High Performance Computing Procurement Manual.
NeSI HPC platforms requirements
The key business requirements to be delivered through implementation of the National Platforms Framework include:
Making it easy for users to develop and run research workloads/jobs and apply HPC compute and Data Analytics tools on either/both platforms
Fit-for-Purpose platforms that meet researcher needs – Capacity (including high throughput) and Capability
Access to standard “big data” Data Analytics tools
High level of interoperability/commonality of (Systems) management and monitoring systems on the platforms
High IOPS and bandwidth for input/output operations
High reliability and availability
Transparent management of data on tiers (from Flash to disk to tape)
Fastest time to solution
Minimising the Total Cost of Ownership
Reducing diversity between platforms
Implications for infrastructure architecture
The following features are anticipated from NeSI’s renewed platforms infrastructure:
Single sign on (with home institution credentials)
Uniform namespace for home filesystems – or failing that, federated namespace
Same development environment: compilers, linkers, development tools (e.g. profilers, debuggers)
Same “core” software – e.g. Data Analytics tools
Same (or very similar) Systems environment on both platforms
Common implementation of monitoring tools
Transparent data movement (between platforms)
Enhanced data management and archiving facilities
Supported, highly reliable national-scale platforms.
HPC Compute and Analytic, and Data Services, that are comparable in scale (per user) and quality to those available to international collaborators.