Amazon FSx for Lustre customers
-
Adobe
Adobe was founded 40 years ago on the simple idea of creating innovative products that change the world, Adobe offers groundbreaking technology that empowers everyone, everywhere to imagine, create, and bring any digital experience to life.
Challenge: Rather than rely on open source models, Adobe decided to train its own foundational generative AI models tailored for creative use cases.
Solution: Adobe created an AI superhighway on AWS to build an AI training platform and data pipelines to rapidly iterate models. Adobe built its solution with Amazon Elastic Compute Cloud (Amazon EC2) P5 and P4d instances powered by NVIDIA GPUs, Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Elastic Block Store (Amazon EBS), and Amazon Elastic Fabric Adapter (EFA). Adobe also used Amazon Simple Storage Service (Amazon S3) to serve as the data lake and primary repository for the vast troves of data. Adobe used Amazon FSx for Lustre high-performance file storage, for fast access to data and to make sure GPU resources are never left idle.
-
LG AI Research
LG AI Research Together with world-leading AI experts, LG AI Research aims to lead next epoch of AI to realize the promising future with you by providing optimal research environment and leveraging state-of-the-art AI technologies.
Challenge: LG AI Research needed to deploy its foundation model, EXAONE, to production in one year. EXAONE, which stands for “expert AI for everyone,” is a 300-billion-parameter multi-modal model that uses both images and text data.
Solution: LG AI Research used Amazon SageMaker to train its large-scale foundation model and Amazon FSx for Lustre to distribute data into instances to accelerate model training. LG AI Research needed to deploy its foundation model, EXAONE, to production in one year. LG AI Research successfully deployed EXAONE in one year, and reduced costs by approximately 35 percent by eliminating the need for a separate infrastructure management team.
-
Paige
Paige is the leading digital pathology transformation provider, offering a full-scale, AI-enabled, web-based solution that brings efficiency and confidence to cancer diagnosis.
Challenge: Paige’s on-premises solutions were maxed out. Their goal was to train AI and ML models to help with cancer pathology. Paige discovered that the more compute capacity that they have, the faster they can train their models and help solve diagnostic problems.
Solution: To run their ML training workloads, Paige selected Amazon EC2 P4d Instances, powered by NVIDIA A100 Tensor Core GPUs, which deliver high performance for ML training and HPC applications in the cloud. Paige uses Amazon FSx for Lustre, fully managed shared storage built on a popular high-performance file system. The company connected this service with some of its Amazon S3 buckets, which helps its development teams address petabytes of ML input data without manually prestaging data on high-performance filesystems. The outcome of the AWS solution is that Paige can train 10x the amount of on-premises data using AWS infrastructure for ML. Paige also experienced 72% faster internal workflows with Amazon EC2 and Amazon FSx for Lustre.
-
Toyota
Toyota Research Institute chooses FSx for Lustre to reduce object recognition machine learning training times.
Toyota Research Institute (TRI) collects and processes large amounts of sensor data from their autonomous vehicles (AV) test drives. Each training data set is staged in an on-premises NAS device and transferred to Amazon Simple Storage Service (Amazon S3) before processing on a powerful GPU compute cluster. TRI needed a high-performance file system to pair with their compute resources, speed up their ML model training, and accelerate insights for their data scientists.
-
Shell
Shell offers a dynamic portfolio of energy options – from oil, gas and petrochemicals, to wind, solar and hydrogen – Shell is proud to supply the energy their customers need to power their lives.
Challenge: Shell relies on HPC for model building, testing, and validation. From 2020 to 2022, GPU utilization has averaged less than 90%, resulting in project delays and limitations on new algorithm experimentation.
Solution: Shell augments their on-premises compute capacity by bursting to the cloud with Amazon EC2 clusters and Amazon FSx for Lustre. This solution gives Shell the capability to quickly scale up and down, and only purchase additional compute capacity when needed. Shell’s GPU’s are now fully utilized reducing the cost of compute, and accelerating machine learning model testing.
-
Storengy
Storengy, a subsidiary of the ENGIE Group, is a leading supplier of natural gas. The company offers gas storage, geothermal solutions, carbon-free energy production, and storage technologies to enterprises worldwide.
To ensure its products are properly stored, Storengy uses high-tech simulators to evaluate underground gas storage, a process that requires extensive use of high-performance computing (HPC) workloads. The company also uses HPC technology to run natural gas discovery and exploration jobs.
-
Smartronix
Smartronix leverages FSx for Lustre to deliver reliable high performance for their SAS Grid deployments.
Smartronix provides cloud solutions, cyber security, systems integration, worldwide C5ISR and data analytics, and mission-focused engineering for many of the world's leading commercial and federal organizations. Smartronix relied on SAS Grid to analyze and deliver state-wide COVID daily statistics, and found their self-managed, parallel file system difficult to administer and protect.
-
Netflix
Netflix is a streaming service that offers a wide variety of award-winning TV shows, movies, anime, documentaries, and more.
Challenge: Netflix uses large scale distributed training for media ML models, for post-production thumbnails, VFX, and trailer generation for thousands of videos and millions of clips. Netflix was experiencing long waits due to cross-node replication and a 40% GPU idle time.
Solution: Netflix re-architected their data loading pipeline and improved its efficiency by pre-computing all video/audio clips. Netflix also chose Amazon UltraClusters (EC2 P4d instances) to accelerate compute performance. Amazon FSx for Lustre performance enables Netflix to saturate GPU’s, and virtually eliminate GPU idle time. Netflix now experiences a 3-4x improvement using pre-compute and FSx for Lustre, reducing model training time from a week to 1-2 days.
-
Hyundai
Hyundai Motor Company has risen as a globally recognized automobile manufacturer that exports its branded vehicles to over 200 countries.
Challenge: One of the algorithms often used in autonomous driving is semantic segmentation, which is a task to annotate every pixel of an image with an object class. These classes could be road, person, car, building, vegetation, sky, etc. Hyundai tests accuracy, and gathers additional images to correct for the insufficient predictive performance in specific situations. This can be a challenge, however, as there is often not enough time to prepare all the new data while leaving sufficient time to train the model and meet the scheduled deadlines.
Solution: Hyundai selected Amazon SageMaker to automate model training, and Amazon SageMaker library for data parallelism, to move from a single GPU to distributed training. They chose Amazon FSx for Lustre to train models without waiting for data copies. They also chose Amazon S3 for their permanent data storage. Hyundai achieved up to 93% scaling efficiency with 8 GPU instances, or 64 GPUs in total. FSx for Lustre enabled Hyundai to run multiple training jobs and experiments against the same data with zero wait time.
-
Rivian
Rivian is on a mission to keep the world adventurous forever. We believe there is a more responsible way to explore the world and are determined to make the transition to sustainable transportation an exciting one.
To meet accelerated engineering schedules and reduce the need for physical prototypes, electric vehicle manufacturer Rivian relies on advanced modeling and simulation techniques. Using high compute capacity, simulations enable engineers to test new concepts and bring their designs to market quickly.
-
DENSO
Denso develops image sensors for advanced driver-assistance systems (ADAS), which help drivers with functions such as parking and changing lanes.
Challenge: To develop the necessary ML models for ADAS image recognition, DENSO had built GPU clusters in its on-premises environment. However, multiple ML engineers shared limited GPU resources, which impacted productivity—especially during the busy period before a new product release.
Solution: By adopting Amazon SageMaker and Amazon FSx for Lustre, Denso was able to accelerate the creation of ADAS image recognition models by reducing the data acquisition, model development, learning, and evaluation time.
-
Joby Aviation
Joby Aviation uses AWS to revolutionize transportation.
Challenge: Joby engineers rely on high performance computing (HPC) to conduct thousands of complex, compute-intensive Computational Fluid Dynamics (CFD) simulations that use hundreds of CPU cores each and can take many hours to complete.
Solution: Using Amazon Elastic Compute Cloud (Amazon EC2) and Amazon FSx for Lustre enabled Joby to get faster results from their CFD workloads compared to on-premises high-performance computing infrastructure.
-
T-Mobile
T-Mobile realizes $1.5M in annual savings and doubles the speed of SAS Grid workloads using Amazon FSx for Lustre.
Challenge: T-Mobile was experiencing high management overhead and performance difficulties with their self-managed SAS Grid workload.
Solution: T-Mobile deployed Amazon FSx for Lustre, a fully managed high-performance file system, to migrate and scale their SAS Grid infrastructure. T-Mobile utilized the tight integration of Amazon FSx and S3 to reduce their storage overhead and optimize operations.
-
Netflix
Production of the fourth season of Netflix’s episodic drama “The Crown” faced unexpected challenges, as the world went into lockdown for the COVID-19 pandemic just as post-production VFX work was slated to begin. By adopting a cloud-based workflow on AWS, including Amazon FSx Lustre file server for enhanced throughput, Netflix’s in-house VFX team of 10 artists was able to seamlessly complete more than 600 VFX shots for the season’s 10-episode run in just 8 months, all while working remotely.
-
Maxar
Maxar uses AWS to deliver forecasts 58% faster than its weather supercomputer.
Challenge: Maxar Technologies, a trusted partner and innovator in Earth intelligence and Space infrastructure, needed to deliver weather forecasts faster compared to its on-premises supercomputer.
Solution: Maxar worked with AWS to create an HPC solution with key technologies including Amazon Elastic Compute Cloud (Amazon EC2) for secure, highly reliable compute resources, Amazon FSx for Lustre to accelerate the read/write throughput of its application, and AWS ParallelCluster to quickly build HPC compute environments on AWS.
-
INEOS TEAM UK
INEOS TEAM UK accelerates boat design for America’s Cup using AWS.
Challenge: Formed in 2018, INEOS TEAM UK aims to bring the America’s Cup—the oldest international sporting trophy in the world—to Great Britain. America’s Cup restricts on-water testing to no more than 150 days prior to the event, so high-performance computational fluid dynamics (CFD) simulations of monohulls and foils become key to a winning boat design.
Solution: Using AWS, INEOS TEAM UK can process thousands of design simulations for its America’s Cup boat in one week versus in more than a month using an on-premises environment. INEOS TEAM UK competed in the 36th edition of the America’s Cup in 2021. The team is using an HPC environment running on Amazon EC2 Spot Instances. To ensure fast disk performance for the thousands of simulations completed each week, the team also used Amazon FSx for Lustre to provide a fast, scalable, and secure high-performance file system based on Amazon Simple Storage Service (S3).
-
Hive VFX
Hive VFX cuts upfront studio costs, operates as cloud VFX studio on AWS.
Challenge: Hive needed high-performance infrastructure to launch a small, independent cloud studio for remote artists around the world to create quality content.
Solution: Fully managed Amazon FSx for Lustre, integrated with Amazon S3, provided fast access to AWS compute resources without a big upfront investment or in-house IT team expertise. The seamless synchronization of file data and file permissions between FSx Lustre and S3 enabled Hive VFX to store a large volume of images and share project data across continents.
-
Lyell
Lyell accelerates their cell-based cancer treatment research with Amazon FSx for Lustre.
Challenge: Lyell delivers curative, cell-based cancer treatments that require running large scale computational design of proteins. These workloads were traditionally run on premises, but the company needed a more scalable, cost-effective solution as they were limited to running only one experiment per month.
Solution: Since migrating their file system to FSx for Lustre, data scientists can spin-up and spin-down thousands of HPC clusters consisting of EC2 instances and Amazon FSx file systems, enabling them to run processing-heavy experiments rapidly, and only pay for compute and storage for the duration of the workload.
-
BlackThorn Therapeutics
BlackThorn Therapeutics accelerates time-to-insight with FSx for Lustre.
Challenge: Processing magnetic resonance imaging (MRI) data using standard DiY cloud file systems was resource- and time-intensive. BlackThorn needed a compute-intensive, shared file storage solution to help simplify their data science and machine learning workflows.
Solution: Amazon FSx for Lustre is integrated with Amazon S3 and Amazon SageMaker, providing fast processing for their ML training data sets as well as seamless access to compute using Amazon EC2 instances.
-
Qubole
Qubole improves data durability while reducing cost with Amazon FSx for Lustre.
Challenge: Qubole was seeking a high-performance storage solution to process analytical and AI/ML workloads for their customers. They needed to easily store and process the intermediate data held in their EC2 Spot Fleet.
Solution: Qubole used Amazon FSx for Lustre to store and process intermediate data through its parallel, high-speed file system.