Posted On: May 8, 2024
Starting today, the Amazon Elastic Compute Cloud (Amazon EC2) Inf2 instances are generally available in the Asia Pacific (Sydney), Europe (London), Europe (Paris), Europe (Stockholm), South America (Sao Paulo) regions. These instances deliver high performance at the lowest cost in Amazon EC2 for generative AI models.
You can use Inf2 instances to run popular applications such as text summarization, code generation, video and image generation, speech recognition, personalization, and more. Inf2 instances are the first inference-optimized instances in Amazon EC2 to introduce scale-out distributed inference supported by NeuronLink, a high-speed, nonblocking interconnect. Inf2 instances offer up to 2.3 petaflops and up to 384 GB of total accelerator memory with 9.8 TB/s bandwidth.
The AWS Neuron SDK integrates natively with popular machine learning frameworks, so you can continue using your existing frameworks to deploy on Inf2. Developers can get started with Inf2 instances using AWS Deep Learning AMIs, AWS Deep Learning Containers, or managed services such as Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), and Amazon SageMaker.
Inf2 instances are now available in four sizes: inf2.xlarge, inf2.8xlarge, inf2.24xlarge, inf2.48xlarge in 13 AWS Regions as On-Demand Instances, Reserved Instances, and Spot Instances, or as part of a Savings Plan.
To learn more about Inf2 instances, see the Amazon EC2 Inf2 Instances webpage and the AWS Neuron Documentation.