AWS Trainium Customers

See how customers are using AWS Trainium to build, train, and fine-tune deep learning models.
  • Anthropic

    At Anthropic, millions of people rely on Claude daily for their work. We're announcing two major advances with AWS: First, a new "latency-optimized mode" for Claude 3.5 Haiku which runs 60% faster on Trainium2 via Amazon Bedrock. And second, Project Rainier—a new cluster with hundreds of thousands of Trainium2 chips delivering hundreds of exaflops, which is over five times the size of our previous cluster. Project Rainier will help power both our research and our next generation of scaling. For our customers, this means more intelligence, lower prices, and faster speeds. We're not just building faster AI, we're building trustworthy AI that scales.

    Tom Brown, Chief Compute Officer at Anthropic
  • Databricks

    Databricks’ Mosaic AI enables organizations to build and deploy quality Agent Systems. It is built natively on top of the data lakehouse, enabling customers to easily and securely customize their models with enterprise data and deliver more accurate and domain-specific outputs. Thanks to Trainium's high performance and cost-effectiveness, customers can scale model training on Mosaic AI at a low cost. Trainium2’s availability will be a major benefit to Databricks and its customers as demand for Mosaic AI continues to scale across all customer segments and around the world. Databricks, one of the largest data and AI companies in the world, plans to use TRN2 to deliver better results and lower TCO by up to 30% for its customers.

    Naveen Rao, VP of Generative AI at Databricks
  • poolside

    At poolside, we are set to build a world where AI will drive the majority of economically valuable work and scientific progress. We believe that software development will be the first major capability in neural networks that reaches human-level intelligence because it's the domain where we can combine Search and Learning approaches the best. To enable that, we're building foundation models, an API, and an Assistant to bring the power of generative AI to your developers' hands (or keyboard). A major key to enable this technology, is the infrastructure we are using to build and run our products. With AWS Trainium2 our customers will be able to scale their usage of poolside at a price performance ratio unlike other AI accelerators. In addition, we plan to train future models with Trainium2 UltraServers with expected savings of 40% compared to EC2 P5 instances.

    Eiso Kant, CTO & Co-founder, poolside
  • Itaú Unibanco

    Itaú Unibanco's purpose is to improve people's relationship with money, creating positive impact on their lives while expanding their opportunities for transformation. At Itaú Unibanco, we believe that each customer is unique and we focus on meeting their needs through intuitive digital journeys, that leverage the power of AI to constantly adapt to their consumer habits.

    We have tested AWS Trainium and Inferentia across various tasks, ranging from standard inference to fine-tuned applications. The performance of these AI chips have enabled us to achieve significant milestones in our research and development. For both batch and online inference tasks, we have seen a 7x improvement in throughput compared to GPUs. This enhanced performance is driving the expansion of more use cases across the organization. The latest generation of Trainium2 chips unlocks groundbreaking features for GenAI and opens the door for innovation at Itau.

    Vitor Azeka, Head of Data Science at Itaú Unibanco
  • NinjaTech AI

    Ninja is an All-In-One AI Agent for Unlimited Productivity: one simple subscription, unlimited access to world’s best AI models along with top AI skills such as: writing, coding, brainstorming, image generation, online research. Ninja is an agentic platform and offers “SuperAgent” which uses Mixture-of-agents with world class accuracy comparable to (and in some categories it’s beating) frontier foundation models. Ninja’s Agentic technology demands the highest performance accelerators, to deliver the unique real- time experiences our customers expect. 

    We are extremely excited for the launch of AWS TRN2 because we believe it’ll offer the best cost per token performance and most the fastest speed currently possible for our core model Ninja LLM which is based off of Llama 3.1 405B. It’s amazing to see Trn2’s low latency coupled with competitive pricing and on-demand availability; we couldn’t be more excited about Trn2’s arrival!

    Babak Pahlavan, Founder & CEO, NinjaTech AI
  • Ricoh

    The RICOH machine learning team develops workplace solutions and digital transformation services designed to manage and optimize the flow of information across our enterprise solutions.

    The migration to Trn1 instances was easy and straightforward. We were able to pretrain our 13B parameter LLM in just 8 days, utilizing a cluster of 4,096 Trainium chips! After the success we saw with our smaller model, we fine-tuned a new, larger LLM based on Llama-3-Swallow-70B, and leveraging Trainium we were able to reduce our training costs by 50% and improve the energy efficiency by 25% as compared to using latest GPU machines in AWS. We are excited to leverage the latest generation of AWS AI Chips, Trainium2, to continue to provide our customers with the best performance at the lowest cost.

    Yoshiaki Umetsu, Director, Digital Technology Development Center, Ricoh
  • Arcee AI

    Arcee AI offers an enterprise-grade generative AI platform, Arcee Orchestra, which is powered by our industry-leading small language models (SLMs). Arcee Orchestra makes it easy for customers to build agentic AI workflows that automatically route tasks to specialized SLMs to deliver detailed, trustworthy responses, without any data leaving their VPC. Using AWS Trainium and Inferentia instances enables us to provide customers with unmatched cost- performance. For example, when using Inferentia2-based instances, our SuperNova-Lite 8- billion-parameter model is 32% more cost-efficient for inference workloads compared to the next best GPU-based instance, without sacrificing performance. We are excited to leverage the latest generation of AWS AI Chips, Trainium2, to continue to provide our customers with the best performance at the lowest cost.

    Julien Simon, Chief Evangelist, Arcee AI
  • PyTorch

    What I liked most about AWS Neuron NxD Inference library is how seamlessly it integrates with PyTorch models. NxD's approach is straightforward and user-friendly. Our team was able to onboard HuggingFace PyTorch models with minimal code changes in a short time frame. Enabling advanced features like Continuous Batching and Speculative Decoding was straightforward. This ease of use enhances developer productivity, allowing teams to focus more on innovation and less on integration challenges.

    Hamid Shojanazeri , PyTorch Partner Engineering Lead, Meta
  • Refact.ai

    Refact.ai offers comprehensive AI tools such as code auto-completion powered by Retrieval-Augmented Generation (RAG), providing more accurate suggestions, and a context-aware chat using both proprietary and open-source models.

    Customers have seen up to 20% higher performance and 1.5x higher tokens per dollar with EC2 Inf2 instances compared to EC2 G5 instances. Refact.ai’s fine-tuning capabilities further enhance our customers’ ability to understand and adapt to their organizations’ unique codebase and environment. We are also excited to offer the capabilities of Trainium2, that will bring even faster, more efficient processing to our workflows. This advanced technology will enable our customers to accelerate their software development process, by boosting developer productivity while maintaining strict security standards for their code base.

    Oleg Klimov CEO & Founder, Refact.ai
  • Karakuri Inc.

    KARAKURI, builds AI tools to improve the efficiency of web based customer support and simplify customer experiences. These tools include AI chatbots equipped with generative AI functions an FAQ centralization tools, and an email response tool, all of which improve the efficiency and quality of customer support. Utilizing AWS Trainium, we succeeded in training KARAKURI LM 8x7B Chat v0.1. For startups, like ourselves, we need to optimize the time to build and the cost required to train LLMs. With the support of AWS Trainium and AWS Team, we were able to develop a practical level LLM in a short period of time. Also, by adopting AWS Inferentia, we were able to build a fast and cost-effective inference service. We're energized about Trainium2 because it will revolutionize our training process, reducing our training time by 2x and driving efficiency to new heights!

    Tomofumi Nakayama, Co-Founder, Karakuri Inc.
  • ELYZA

    ELYZA is a GenAI company developing large language models (LLMs), supporting the use of generative AI in companies, and providing AI SaaS. Amazon’s inferentia2 accelerators enabled us to achieve high throughput and low latency while significantly reducing costs, which was crucial for building our LLM demo service. By combining this infrastructure with the Speculative Decoding technique, we successfully doubled our original inference speed. Trainium2's impressive increase in inference capabilities compared to Inferentia2 shows immense promise, and we're thrilled to see how it will drive transformative results in our work.

    Kota Kakiuchi, CTO, ELYZA
  • Stockmark Inc.

    With the mission of “reinventing the mechanism of value creation and advancing humanity,” Stockmark helps many companies create and build innovative businesses by providing cutting-edge natural language processing technology. Stockmark’s new data analysis and gathering service called Anews and SAT, a Data structuring service that dramatically improves generative AI uses by organizing all forms of information stored in an organization, required us to rethink how we built and deployed models to support these products. With 256 Trainium accelerators, we have developed and released stockmark- 13b, a large language model with 13 billion parameters, pre-trained from scratch on a Japanese corpus dataset of 220B tokens. Trn1 instances helped us to reduce our training costs by 20%. Leveraging Trainium, we successfully developed an LLM that can answer business- critical questions for professionals with unprecedented accuracy and speed. This achievement is particularly noteworthy given the widespread challenge companies face in securing adequate computational resources for model development. With the impressive speed and cost reduction of Trn1 instances, we are excited to see the additional benefits that Trainium2 will bring to our workflows and customers.

    Kosuke Arima, CTO and Co-founder, Stockmark Inc.
  • Brave

    Brave is an independent browser and search engine dedicated to prioritizing user privacy and security. With over 70 million users, we deliver industry-leading protections that make the Web safer and more user-friendly. Unlike other platforms that have shifted away from user-centric approaches, Brave remains committed to putting privacy, security, and convenience first. Key features include blocking harmful scripts and trackers, AI- assisted page summaries powered by LLMs, built-in VPN services, and more. We continually strive to enhance the speed and cost-efficiency of our search services and AI models. To support this, we’re excited to leverage the latest capabilities of AWS AI chips, including Trainium2, to improve user experience as we scale to handle billions of search queries monthly.

    Subu Sathyanarayana , VP of Engineering, Brave Software
  • Anyscale

    Anyscale is the company behind Ray, an AI Compute Engine that fuels ML, and Generative AI initiatives for Enterprises. With Anyscale's unified AI platform driven by RayTurbo, customers see up to 4.5x faster data processing, 10X lower cost batch inference with LLMs, 5x faster scaling, 12X faster iteration, and cost savings of 50% for online model inference by optimizing utilization of resources.

    At Anyscale, we’re committed to empowering enterprises with the best tools to scale AI workloads efficiently and cost- effectively. With native support for AWS Trainium and Inferentia chips, powered by our RayTurbo runtime, our customers have access to high performing, cost effective options for model training and serving. We are now excited to join forces with AWS on Trainium2, unlocking new opportunities for our customers to innovate rapidly, and deliver high-performing transformative AI experiences at scale.

    Robert Nishihara, Cofounder, Anyscale
  • Datadog

    Datadog, the observability and security platform for cloud applications, provides AWS Trainium and Inferentia Monitoring for customers to optimize model performance, improve efficiency, and reduce costs. Datadog’s integration provides full visibility into ML operations and underlying chip performance, enabling proactive issue resolution and seamless infrastructure scaling. We're excited to extend our partnership with AWS for the AWS Trainium2 launch, which helps users cut AI infrastructure costs by up to 50% and boost model training and deployment performance.

    Yrieix Garnier, VP of Product Company, Datadog
  • Hugging Face

    Hugging Face is the leading open platform for AI builders, with over 2 million models, datasets and AI applications shared by a community of more than 5 million researchers, data scientists, machine learning engineers and software developers. We have been collaborating with AWS over the last couple of years, making it easier for developers to experience the performance and cost benefits of AWS Inferentia and Trainium through the Optimum Neuron open source library, integrated in Hugging Face Inference Endpoints, and now optimized within our new HUGS self-deployment service, available on the AWS Marketplace. With the launch of Trainium2, our users will access even higher performance to develop and deploy models faster.

    Jeff Boudier, Head of Product, Hugging Face
  • Lightning AI

    Lightning AI, the creator of PyTorch Lightning and Lightning Studios offers the most intuitive, all-in-one AI development platform for enterprise-grade AI. Lightning provides full code, low-code and no-code tools to build agents, AI applications and generative AI solutions, Lightning fast. Designed for flexibility, it runs seamlessly on your cloud or ours leveraging the expertise and support of a 3M+ strong developer community.

    Lightning now natively offers support for AWS AI Chips, Trainium and Inferentia, which are integrated across Lightning Studios and our open-source tools like PyTorch Lightning, Fabric, and LitServe. This gives users seamless capability to pretrain, fine-tune, and deploy at scale—optimizing cost, availability, and performance with zero switching overhead, and the performance and cost benefits of AWS AI Chips, including the latest generation of Trainium2 chips, delivering higher performance at lower cost.

    Luca Antiga, CTO, Lightning AI
  • Domino Data Lab

    Domino’s unified AI platform gives enterprise data science teams the ability to build and operate AI at scale. Leading enterprises balance technical complexity, costs, and governance, mastering expansive AI options to innovate. With AWS Trainium and Inferentia, we empower our customers to gain high performance and efficiency without compromise. And with the launch of AWS Trainium2 our customers are able to train and deploy models at with higher performance and at lower cost. Domino’s support for the AWS Trainium2 launch provides our customers additional options to train and deploy models cost- and energy-efficiently.

    Nick Elprin, CEO and Co-founder, Domino Data Lab
  • Helixon

    At HeliXon, we build next-generation AI solutions to protein-based therapeutics. We aim to develop AI tools that empower scientists to decipher protein function and interaction, interrogate large-scale genomic datasets for target identification, and design therapeutics such as antibodies and cell therapies. Today, we use training distribution libraries like FSDP to parallelize model training over many GPU-based servers, but this still takes us weeks to train a single model. We are excited to utilize Amazon EC2 Trn1 instances, featuring the highest networking bandwidth (800 Gbps) available in AWS to improve the performance of our distributed training jobs and reduce our model training times, while also reducing our training costs.

    Jian Peng, CEO, Helixon
  • Money Forward, Inc.

    Money Forward, Inc. serves businesses and individuals with an open and fair financial platform.

    We launched a large-scale AI chatbot service on the Amazon EC2 Inf1 instances and reduced our inference latency by 97% over comparable GPU-based instances while also reducing costs. As we keep fine-tuning tailored NLP models periodically, reducing model training times and costs is also important. Based on our experience from successful migration of inference workload on Inf1 instances and our initial work on AWS Trainium-based EC2 Trn1 instances, we expect Trn1 instances will provide additional value in improving end-to-end ML performance and cost.

    Takuya Nakade, CTO, Money Forward, Inc.
  • Magic

    Magic is an integrated product and research company developing AI that feels like a colleague to make the world more productive.

    Training large autoregressive Transformer-based models is an essential component of our work. AWS Trainium-powered Trn1 instances are designed specifically for these workloads, offering near infinite scalability, fast inter-node networking, and advanced support for 16- and 8-bit data types. Trn1 instances will help us train large models faster, at a lower cost. We are particularly excited about the native support for BF16 stochastic rounding in Trainium, increasing performance while numerical accuracy is indistinguishable from full precision.

    Eric Steinberger, Cofounder and CEO, Magic
  • CACTUS LABS

    CACTUS has a suite of products and solutions for researchers, and organizations that improve how research gets funded, published, communicated and discovered.

    At Cactus Labs, we harness the power of AI, with research focused on natural language processing, ranking and recommendation, conversational AI, large language models, computer vision, AR/VR and XAI. In line with our quest to enable faster training of machine learning models as well as enable our researchers to run more experiments while managing the infrastructure cost, we were delighted to evaluate AWS Trainium. AWS Trainium’s out of the box features like XLA optimization, multi-worker data parallel training, and graph caching are really useful for us to reduce our training times and help us run more experiments faster and cheaper.

    Nishchay Shah, CTO and Head of Emerging Products, Cactus Communications
  • Watashiha

    Watashiha offers an innovative and interactive AI chatbot service, “OGIRI AI,” which incorporates humor to provide a funny answer on the spot for a question.

    We use Large Language Models to incorporate humor and offer a more relevant and conversational experience to our customers on our AI services. This requires us to pre-train and fine-tune these models frequently. We pre-trained a GPT-based Japanese model on the EC2 Trn1.32xlarge instance, leveraging tensor and data parallelism. The training was completed within 28 days at a 33% cost reduction over our previous GPU based infrastructure. As our models rapidly continue to grow in complexity, we are looking forward to Trn1n instances which has double the network bandwidth of Trn1 to speed up training of larger models.

    Yohei Kobashi, CTO, Watashiha, K.K.
  • Amazon

    Amazon’s product search engine indexes billions of products, serves billions of customer queries daily, and is one of the most heavily used services in the world.

    We are training large language models (LLM) that are multi-modal (text + image), multilingual, multi-locale, pre-trained on multiple tasks, and span multiple entities (products, queries, brands, reviews, etc.) to improve the customer shopping experience. Trn1 instances provide a more sustainable way to train LLMs by delivering the best performance/watt compared to other accelerated machine-learning solutions and offers us high performance at the lowest cost. We plan to explore the new configurable FP8 datatype, and hardware-accelerated stochastic rounding to further increase our training efficiency and development velocity.

    Trishul Chilimbi, VP, Amazon Search