AWS Storage Blog
How Torc Robotics reduces storage costs with S3 Intelligent-Tiering
Do you manage petabytes of data or tens to hundreds of buckets on Amazon S3 across multiple business units and multiple teams? If you do, chances are that application requirements and access patterns vary widely from one business unit to another. And, if you’re like many AWS customers today, you’re looking for the easiest and fastest way to optimize your storage costs centrally without impacting application performance across your business.
In this post, we explain how S3 Storage Lens, an analytics feature built-in to the S3 console, and S3 Intelligent-Tiering, an S3 storage class that automatically optimizes storage costs by moving data between access tiers, helped Torc Robotics optimize its storage costs quickly across its entire business – all in a day’s work.
About Torc Robotics
Torc Robotics, headquartered in Blacksburg, Virginia, is an independent subsidiary of Daimler Truck AG, the global leader and pioneer in trucking. Founded in 2005 at the birth of the self-driving vehicle revolution, Torc has 16 years of experience in pioneering safety-critical, self-driving applications. Torc offers a complete self-driving vehicle software and integration solution and is currently focusing on commercializing self-driving trucks. “Trucking is the backbone of the US economy, delivering food and products to every community in the country,” said Torc CEO Michael Fleming. “Daimler has led innovation in trucking for more than a century, from the first truck to driver assist technology. Torc is working with Daimler Trucks to commercialize self-driving trucks to make our roads safer and better fulfilling our mission of saving lives.”
Data management challenges
Torc Robotics has a central vehicle data acquisition team that ensures that Torc Robotics’ architecture is secure, performant, and cost effective. The advancement of Torc’s autonomous driving technology relies on data-driven development leveraging multiple datasets stored in Amazon S3 from real world and synthetic world-driving outputs. These different driving scenarios are used by the various development teams running large-scale simulations, re-simulations, analytics, visualizations, and search functions on the data. The scenarios of interest are unpredictable as development progresses on various elements of the autonomous driving system.
With Torc Robotics’s rapid growth, its S3 storage quickly grew to petabytes of data in S3 buckets that its vehicle data acquisition team was looking to optimize. Justin Brown, Head of Vehicle Data Acquisition, said, “Our team has ongoing reviews to analyze our AWS Cost Explorer reports by AWS service and noticed that our Amazon S3 storage usage growth and costs were accelerating, driven by an increasing amount of data stored in the S3 Standard storage class.” As Technical Product Director of Infrastructure & Tools, I (Derek) have always said, “We prioritized optimizing our Amazon S3 usage to support future growth. However, all of the buckets were a black box and we needed to find a safe solution we could push across all of Torc Robotics without impacting performance.”
To summarize, the challenge Torc Robotics’s central vehicle data acquisition team faced was twofold:
- Torc needed a way to manage a large volume of raw autonomous vehicle data and execute metadata extraction and enrichment tasks.
- It was important to Torc to save on storage costs, but also to keep its data immediately available for future/planned analysis, such as Machine Learning (ML), deep-learning model training.
Amazon S3 Storage Lens for organization-wide visibility
Torc Robotics wanted a way to identify the largest buckets and prefixes driving storage costs to prioritize the largest cost-saving opportunities. This was challenging for Torc Robotics. But with the recent launch of S3 Storage Lens, Torc Robotics was able to easily get organization-wide visibility into its storage usage and activity trends. For example, S3 Storage lens gives users visualizations that show the top buckets based on storage size, the number of objects, the number of requests, and the average object size. And for large multi-use buckets where Torc Robotics needed to go a layer deeper, S3 Storage Lens also provides prefix level aggregations. S3 Storage Lens enabled Torc Robotics to quickly identify the largest and fastest-growing buckets and prefixes.
S3 Storage Lens summary analysis
Looking at the largest and fastest-growing buckets, Torc Robotics found that retrievals, requests, and the object sizes varied widely. When thinking about optimizing storage costs, this is important because buckets with high retrievals might lead to high retrieval fees in the S3 Standard-Infrequent Access storage class. While the usage profile across buckets is diverse, Torc Robotics was looking for a way to save on storage costs without taking on the operational overhead of customizing how each individual bucket is optimized based on its unique storage usage access patterns.
Optimizing storage costs with Amazon S3 Intelligent-Tiering
To recap, Torc Robotics wanted a way to quickly optimize storage costs across its largest and fastest-growing buckets identified using S3 Storage Lens. Torc’s event-driven architecture built on top of S3 allows for ML, AI, analysis, and enrichment pipelines to process new data in real time. This architecture, combined with elastic scaling for compute, allows for parallel and rapid processing of fresh data. However, because the storage usage patterns vary widely across its top buckets, there was no clear-cut rule Torc could safely apply without taking on some operational overhead.
Following Amazon S3’s cost optimization best practices, Torc Robotics found that the S3 Intelligent-Tiering storage class delivers automatic storage savings based on the changing access patterns of its data without any impact on performance.
Torc’s event-driven architecture
The S3 Intelligent-Tiering storage class gives you a way to save money even under changing access patterns, with no performance impact, no operational overhead, and no retrieval fees. For a small monthly object monitoring and automation charge, S3 Intelligent-Tiering monitors access patterns and automatically moves objects that have not been accessed to lower-cost access tiers. S3 Intelligent-Tiering delivers automatic storage cost savings in two low latency and high throughput access tiers. For data that can be accessed asynchronously, you can choose to activate automatic additional archiving capabilities within the S3 Intelligent-Tiering storage class. There are no retrieval charges in S3 Intelligent-Tiering. If an object in the Infrequent Access tier is accessed later, it is automatically moved back to the Frequent Access tier. No additional tiering or lifecycle charges apply when objects are moved between access tiers within the S3 Intelligent-Tiering storage class.
S3 Intelligent-Tiering is the ideal storage class when speed and agility matter. “We had thought through several different scenarios tiering solutions but none of them were straightforward to manage,” said Zane Reynolds, Engineering Manager, Cloud, Infrastructure & Tools. “S3 Intelligent-Tiering was our ‘easy’ button and helped us move at the speed we needed without adding development cycles.”
As you can see in the following graph, after Torc transitioned its data from S3 Standard to S3 Intelligent-Tiering, that data was stored in the Frequent Access tier. Then, we see data that not accessed for 30 consecutive days move down to the S3 Intelligent-Tiering Infrequent Access tier. S3 enabled Torc to be efficient with large throughput with highly variable simulations.
Torc Robotics’ use of S3 Intelligent-Tiering is realizing automatic storage cost savings of 24% per month without impacting application performance or adding development work. Now, Torc Robotics’s vehicle data acquisition team is focused on helping teams design the data architecture for new products and services that will most likely lead to even more data growth. Looking ahead, the Torc Robotics team plans to enable the S3 Intelligent-Tiering asynchronous archive access tiers to further reduce its storage costs on data that can be accessed within minutes to hours.
Some things to keep in mind
- Torc, as an AWS Enterprise Support customer leveraged AWS Infrastructure Event Management through this storage journey to the cloud. This enabled the vehicle data acquisition team to be confident in the architecture changes while maintaining the throughput and latency.
- Object size: You can use S3 Intelligent-Tiering for objects of any size, but objects smaller than 128 KB will not be monitored or auto tiered. For each object archived to the Archive Access tier or Deep Archive Access tier, Amazon S3 uses 8 KB of storage for the name of the object and other metadata (billed at S3 Standard storage rates) and 32 KB of storage for index and related metadata (billed at S3 Glacier and S3 Glacier Deep Archive storage rates). This enables you to get a real-time list of all of your S3 objects from S3 Inventory.
- S3 Object Lock: You can use S3 Object Lock to store objects using a write-once-read-many (WORM) model. Object Lock can help prevent objects from being deleted or overwritten for a fixed amount of time or indefinitely. You can use S3 Object Lock to meet regulatory requirements that require WORM storage, or add an extra layer of protection against object changes and deletion.
- S3 Access Points: You can access the objects in an Amazon S3 bucket with an S3 Access Point using the AWS Management Console, AWS CLI, AWS SDKs, or the S3 REST APIs.
- Durability and availability: S3 Intelligent-Tiering is designed for 99.9% availability and 99.999999999% durability.
- Pricing: You pay for monthly storage requests, and data transfers. When using S3 Intelligent-Tiering, you pay for a small monthly per-object fee for monitoring and automation. There is no retrieval fee in S3 Intelligent-Tiering and no fee for moving data between tiers.
- Objects in the Frequent Access tier are billed at the same rate as S3 Standard.
- Objects stored in the Infrequent Access tier are billed at the same rate as S3 Standard-Infrequent Access.
- Objects stored in the Archive Access tier are billed at the same rate as S3 Glacier.
- Objects stored in the Deep Archive access tier are billed at the same rate as S3 Glacier Deep Archive.
- API and CLI access: You can use S3 Intelligent-Tiering through the Amazon S3 CLI and S3 API operations with the
INTELLIGENT_TIERING
storage class. You can also configure the S3 Intelligent-Tiering archive using PUT, GET, and Delete configuration APIs for a specific bucket. - Feature support: S3 Intelligent-Tiering supports features like S3 Inventory to report on the access tier of objects, and S3 Replication to replicate data to any AWS Region.
Key takeaways and conclusion
To recap, Torc needed a cost-effective way to manage a large volume of raw autonomous vehicle data and execute metadata extraction and enrichment tasks. At Torc, data access patterns for analyses and building machine learning models vary across the organization. Torc also wants its data to be immediately retrievable. S3 Intelligent-Tiering was the ideal solution for enabling Torc to optimize its storage costs without impacting application performance across its organization, and without needing to manage retrieval fees when data access patterns changed.
We recommend using S3 Storage Lens when:
- You need visibility into your S3 storage and activity metrics across hundreds of buckets and accounts.
- You want an easy way to quickly identify cost efficiency opportunities.
- You have large multi-use buckets where you need prefix level aggregations.
We recommend using S3 Intelligent-Tiering when:
- You have workloads with changing access patterns such as autonomous-driving development and deployment with simulations.
- You want an easy and automatic way to save on your Amazon S3 bill.
- You need the same low latency and high throughput performance as S3 Standard.