AWS HPC Blog
Coming soon: dedicated HPC instances and hybrid functionality
Last year we introduced Amazon EC2 UltraClusters of P4d instances, which put more than 4,000 NVIDIA A100 GPUs on a petabit-scale, non-blocking network, and we made them available to anyone with a model to train and a problem to solve. The feedback from customers has been great, and reinforced the need to keep pushing the boundaries of our imagination when working with them to solve hard problems.
In recent months, we’ve been busy making our Elastic Fabric Adapter (EFA) a mainstream technology in newer EC2 instance families so customers can truly optimize compute environments for their HPC codes by choosing from instance families with widely different characteristics. It led us to launch new Intel Ice Lake-based M6i and C6i instances – the C6i offers up to 15% better price-performance than C5, and comes with EFA support out of the box. We also launched the DL1 Habana-based instance, which offers up to 40% better price performance for training deep learning models. It also supports EFA.
EFA has been an important enabler for us, and along with the Nitro System, is accelerating the creation of this broad selection of instance families. It’s also let us balance our efforts between seeking performance from hardware advances, and productivity driven by software improvements. We’re conscious that HPC is a tool used by humans, and the productivity of those humans is the real measure of success.
All of this brings us to today’s announcements, from each of these two important areas.
Introducing Hpc6a
While existing customers have loved our range of HPC offerings, we know they’re also focused on lowering costs. Some workloads depend on a cost factor more than others.
So today we’re excited to announce the upcoming availability of a new HPC-optimized EC2 Hpc6a instance, with the best price-performance for running compute-intensive HPC workloads in Amazon EC2. The Hpc6a uses AMD’s 3rd generation EPYC (Milan) processors, and offers up to 65% better price-performance over comparable x86-based compute-optimized instances. And of course, it comes with 100 Gb/s EFA to run MPI applications at scale.
There’s been a lot of engineering work in Hpc6a to satisfy a broad range of HPC workloads. We’ll have more on that, including detail specs, pricing, and regional availability, at launch.
NICE EnginFrame
We’re also happy to announce the upcoming availability of NICE EnginFrame with support for hybrid environments. EnginFrame customers will be able to manage their HPC workflows both on-premises and in-cloud environments through a single, unified interface.
Customers have been telling us they want to maximize the returns from their existing investments in on-premises systems. For some time, EnginFrame has been helping to make HPC systems easier to use and has become a powerful productivity lever for many scientific and engineering organizations, whether they’re running on-premises or in the cloud.
As one part of this approach, EnginFrame integrates tightly with NICE DCV, our high-performance remote display protocol. DCV provides customers with a secure way to deliver remote desktops and application streaming from any server (on-premises or in the cloud) to any device, over varying network conditions. Companies as diverse as Volkswagen and Netflix use DCV to power their workforces, and it’s a powerful reminder that making HPC easier to use is crucial to applying it to a greater range of problems.
But we wanted the elasticity of the cloud to be tightly woven into EnginFrame. And we think EnginFrame’s productivity benefits should flow through to infrastructure provisioning, too – not just the application pathways. So, we’ve been hard at work this year enhancing EnginFrame’s cloud capabilities – and leveraging the new cluster API layer we launched with AWS ParallelCluster 3 in September, as preparation for this.
Detailed specs, pricing, and updated installation choices will be shared at launch.
We’re not done
This year, we’ve launched a lot of new capabilities for HPC customers, making AWS the best place for the length and breadth of their workflows. EFA went mainstream in the C6i, M6i, and DL1 instances, making it sixteen instance families now that have fast fabric capabilities for scaling MPI and NCCL codes. That list includes the C6gn, powered by our own AWS Graviton2, the first HPC-optimized Arm-based instance available in the cloud. We’ve also shown the way with deep-dive studies to explore and explain the optimizations that will drive your workloads faster in the cloud than elsewhere.
We released a major new version of AWS ParallelCluster with its own API for controlling the cluster lifecycle. AWS Batch became deeply integrated into AWS Step Functions and now supports fair-share scheduling, with multiple levers to control the experience.
And today we’re signaling the arrival of a new HPC-dedicated instance family – the Hpc6a – and an enhanced EnginFrame that will bring the best of the cloud and on-premises together in a single interface.
We’re humbled that the HPC community has noticed what our customers have done with these innovations. Those customers had another great year, scooping so many of HPCwire’s Readers’ Choice Awards at SC’21, which wraps up today. But the heroes are the researchers and engineers who are saving lives and solving immensely difficult problems, by doing what we once thought impossible.
That’s why it’s important that these new features and instances were built by starting from the customers’ problems, and working backwards. This frequently takes us to places we’d didn’t expect. But we instinctively know this is the right approach, because your feedback helps us engineer those destinations. We can’t wait to share more of our plans with you and see what kind of future you create.