Amazon SageMaker HyperPod customers
Top AI startups and organizations of all sizes are training and deploying foundation models at scale on SageMaker HyperPod-
Hugging Face
-
Perplexity AI
-
Articul8 AI
-
Thomson Reuters
Thomson Reuters, a global AI and content-driven technology company, has been testing the task governance capability in Amazon SageMaker HyperPod to address a key challenge around workload prioritization. With task governance, now they can manage customer workloads such as inference requests alongside their own ongoing model development projects, ensuring prioritizing urgent customer requests without disrupting internal research, leading to better resource utilization and customer satisfaction.
-
Thomson Reuters
-
Thomson Reuters
-
Stability AI
-
Observea
-
Recursal AI
-
Hippocratic AI
Hippocratic AI, an AI company that develops the first safety-focused Large Language Model (LLM) for healthcare. To train its primary LLM and the supervisor models, Hippocratic AI required powerful compute resources, which were in high demand and difficult to obtain. Amazon SageMaker HyperPod flexible training plans made it easier for them to gain access to Amazon Elastic Compute Cloud (Amazon EC2) P5 Instances. HippocraticAI is also leveraging AWS services such as Grafana to track important GPU utilization metrics. Using Amazon EC2 P5 Instances, Hippocratic AI has increased model training speed by four times and scales its solution to accommodate hundreds of use cases. It helped them to secure the required compute resources and train models quickly.
-
Articul8
Amazon SageMaker HyperPod task governance helps maximize GPU utilization across various teams and projects. As a fast-growing GenAI startup, Articul8 AI constantly optimizes their compute environment to allocate accelerated compute resources as efficiently as possible. With automated task prioritization and resource allocation in SageMaker HyperPod, they have seen a dramatic improvement in GPU utilization, thereby reducing idle time and accelerating their model development process by optimizing tasks ranging from training and fine-tuning to inference. The ability to automatically shift resources to high-priority tasks has increased their team's productivity allowing them to bring new GenAI innovations to market faster than ever before.
-
NinjaTech
NinjaTech AI, a generative AI company that provides an all-in-one SuperAgent for unlimited productivity, used Amazon SageMaker HyperPod flexible training plans to accelerate fine-tuning of various internal models including the Llama 3.1 405B model to reduce model training costs, and automate the process. The company aims to provide a seamless experience to its users wanting access to various AI agents powering their SuperAgent Technology. To achieve this, they needed a model that could automatically predict user intention and determine which AI agent would be a good fit for it. This mechanism required making frequent updates to the model by incorporating customer feedback and new features iteratively, involving 10m-100m tokens at each round of LoRA fine-tuning. As a startup, acquiring and operating high-performance compute resources is challenging due to its steep cost and bandwidth issues, specifically in multi-node clusters which involve fast network and fast storage in addition to accelerated computing. In addition, the training process is time-consuming, involving steps like model downloading, distributed training, checkpoint, monitoring, auto remediation, merging, and quantization. HyperPod’s flexible training plans provided the company with reliable and affordable compute in advance of the training run, matching their specific compute and timeline requirements, while ensuring efficient model training.
-
OpenBabylon
Developers and data scientists at OpenBabylon, an AI company that customizes large language models for underrepresented languages, has been using SageMaker HyperPod flexible training plans for a few months to streamline their access to GPU resources to run large scale experiments. Using the multi-node SageMaker HyperPod’s distributed training capabilities, they conducted 100 large scale model training experiments, achieving state-of-the-art results in English-to-Ukrainian translation. This breakthrough was achieved on time and cost-effectively, demonstrating SageMaker HyperPod’s ability to successfully deliver complex projects on time and at budget.
-
Salesforce
Researchers at Salesforce were looking for ways to quickly get started with foundational model training and fine-tuning, without having to worry about the infrastructure, or spend weeks optimizing their training stack for each new model. With Amazon SageMaker HyperPod recipes, researchers at Salesforce can conduct rapid prototyping when customizing FMs. Now, Salesforce’s AI Research teams are able to get started in minutes with a variety of pre-training and fine-tuning recipes, and can operationalize frontier models with high performance.
Amazon SageMaker HyperPod partners
Drive innovation and unlock greater business value with AWS partners that have deep technical knowledge and proven customer success
-
Accenture
-
Slalom
-
Rackspace Technology