A technical look at the new tools and capabilities of Mahuika
The recently announced upgrade and extension of Mahuika will bring together new tools and technologies to keep pace with today's increasing diversity of research drivers, including growth in data, complexity of models, and a spread of maturity across research communities.
NeSI engineers are working with Hewlett Packard Enterprise (HPE) to bring the Mahuika extension's technology design and architecture to life. Below we take a deeper dive into the technical details behind the investment.
Mahuika's additional capacity -- based on the class-leading 3rd Gen AMD EPYC Milan architecture -- will allow a wider range of research communities to adopt HPC approaches and build digital skills within their research teams.
- 64 dual socket HPE Apollo 2000 XL225n nodes (AMD EPYC Milan 7713, 64 core / 128 threads, 2.0GHz base clock, 3.675GHz max boost clock, 256MB L3 cache, 225W TDP)
- Of these, 56 have 512GB RAM and the remaining 8 have 1TB RAM
- All have 1.92TB NVMe for swap and/or local scratch/temp, connected via HDR100 InfiniBand
- Two new racks and new switching, including 2 Quantum 200Gb HDR InfiniBand managed switches
- In total adds 8,448 new physical cores (double for logical cores), of which 7,168 are in identical specification nodes linked via a common top of rack leaf switch (providing both consistently low latency and high bandwidth)
As part of this investment, 4 new NVIDIA HGX 80GB A100 4-GPU systems based on NVIDIA's HGX AI supercomputing platform -- building on previous investments and paired with specialised software and tools for machine learning -- will support more analysis at scale.
Also, Mahuika's expanded high-memory capabilities will allow rapid simultaneous processing for faster results and insights.
- 4 HPE Apollo 6500 XL645d 4-way NVLink 80GB A100 HGX nodes with single socket AMD Milan 7713, 512GB RAM, 2 6.4TB NVMe drives, HDR200 InfiniBand
- In total adds 16 new 80GB NVIDIA A100 GPUs
Driven by user insights
Earlier this year, NeSI installed the first tranche of its new NVIDIA A100 Graphics Processing Unit (GPU) cards. On 20 May, we invited research groups using machine learning applications to begin accessing these advanced GPU capabilities (if this applies to you, contact Support to request access!). It was an exciting culmination to months of preparation and technology validation activities.
That pre-launch testing phase is often a story that goes untold, but at NeSI it’s an important part of how we’re actively developing, testing, and problem-solving technology to support research and researcher's evolving needs. So, we’ve decided to share some of our ‘behind-the-scenes' experiences with rolling out the A100s as a round of ‘Tech Insights’ blog posts.
- Tech Insights Post #1: A behind-the-scenes look at rolling out new GPU resources for NZ researchers
- Tech Insights Post #2: Testing the A100 tensor cores on deep learning code
As we progress on this journey with our partners in the research sector, we look forward to delivering tools that respond to researcher's evolving needs, sharing our learnings as we go, and helping more research communities integrate NeSI resources and eResearch approaches into the way they work.
If you’re interested in learning more about NeSI current capabilities and plans in this space, we’d love to hear from you.