Amazon S3 Metadata (Preview)

Accelerate data discovery with near real-time object metadata

Find and organize the data you need in S3

Amazon S3 Metadata (Preview) taps into the full potential of your S3 data by making object metadata readily accessible and easier to query. Surface, store, and query rich metadata for your objects stored in S3, so you can quickly find the data you need for business analytics, real-time inference applications, and more. S3 Metadata supports object metadata, which includes system-defined details like size and the source of the object, and custom metadata, which allows you to use tags to annotate your objects with information like product SKU, transaction ID, or content rating. Read more on the blog

Benefits

Quickly find and retrieve the data you need across up to trillions of objects in S3.

Use tags to annotate your objects with business-specific metadata to improve data organization and searchability.

Designed to automatically capture and organize object metadata in managed S3 Tables with built-in support for Apache Iceberg.

Analyze metadata using familiar AWS services like Amazon Athena, Redshift, EMR, and QuickSight through the S3 Tables preview integration with AWS Glue Data Catalog. S3 Metadata is compatible with popular open source tools.

Use cases

Use rich metadata to catalog stored data for easier discovery and utilization.

Track and manage AI-generated videos, including their origin, creation time, and the AI model used with Amazon Bedrock.

Analyze object metadata to identify opportunities for cost savings and performance improvements.

Quickly identify and analyze relevant datasets for business intelligence and decision-making.

Improve data organization and compliance with custom metadata annotations.

Customers

  • Cambridge Mobile Telematics

    Cambridge Mobile Telematics (CMT) is the world’s largest telematics service provider. Its mission is to make the world’s roads and drivers safer. The company’s AI-driven platform, DriveWell Fusion®, gathers sensor data from millions of IoT devices — including smartphones, proprietary Tags, connected vehicles, dashcams, and third-party devices — and fuses them with contextual data to create a unified view of vehicle and driver behavior.

    At CMT, we store and analyze multiple petabytes of data from mobile IoT devices worldwide to enhance driver and road safety. As we scale, locating specific data for developing new insights and models becomes increasingly challenging. S3 Metadata, particularly its custom metadata capability, allows us to annotate all our data and maintain the metadata in a managed, queryable table. Now, finding relevant data requires just one efficient and cost-effective SQL query. This makes S3 Metadata a game-changer, enabling us to bring new capabilities to our customers.

    Tim Vogel, Chief Information Officer - Cambridge Mobile Telematics
  • PayPal

    PayPal has been revolutionizing commerce globally for more than 25 years. Creating innovative experiences that make moving money, selling, and shopping simple, personalized, and secure, PayPal empowers consumers and businesses in approximately 200 markets to join and thrive in the global economy.

    S3 Metadata provides us with a simple, straightforward mechanism to analyze trillions of S3 objects using standard tools like Amazon Athena and Amazon QuickSight. With this functionality, we can spend our time making decisions rather than building our complex data pipelines to access and query S3 object metadata.

    Jon Southall, VP Engineering, Large Enterprise Platforms - PayPal
  • Roche

    Roche is a biotech company that combines pharmaceuticals and diagnostics to achieve advances in personalized healthcare and improve people’s lives.

    S3 Metadata accelerates our generative AI initiatives. As we build LLM applications such as internal chatbots for our teams, unstructured data like PDFs are becoming increasingly valuable. We need to ingest lots of domain-specific documents to a Retrieval Augmented Generation (RAG) application so that the chatbot can tailor to Roche’s specific business contexts. However, this also means that we have more and more unstructured data that we need to manage. We need a metadata system to efficiently describe our unstructured data so that our users can quickly sift through our large data lake to identify the relevant datasets for the particular generative AI application that they are building. With S3 Metadata, building a robust metadata system has been simplified to a few clicks in the AWS Management Console. As we continuously ingest more unstructured data, S3 Metadata automatically surfaces the metadata and keeps the metadata up-to-date. We also employ our own Lambda to extract business-specific metadata, such as classifying documents based on a taxonomy relevant to Roche, and store this metadata in the same glue catalog alongside the S3 Metadata table so that with a simple SQL join we can have all the metadata we need. S3 Metadata helps us build generative AI applications faster, which allows us to focus on building rather than organizing our data.

    Yannick Misteli, Head of Pharma Commercial Engineering - Roche
  • SmugMug / Flickr

    SmugMug and Flickr provide online platforms where photographers can upload and share photos and videos. The company stores billions of photos and videos on its application.

    Imagine flying a time machine through your Amazon S3 data. At SmugMug and Flickr, we’ve stored over 22 years of our customers’ photos, hundreds of billions of objects, in S3. The new S3 Metadata feature helps us to easily explore our S3 object metadata easily and affordably, querying across metadata such as object size over time to understand how our data has evolved, which previously involved joining expensive database queries with object inventories. Understanding how our photographers use our storage helps further our commitment to build a better world through the power of photography.

    Andrew Shieh, Principal Engineer - SmugMug
  • Solink offers trusted cloud video security systems for businesses of all sizes. Its hardware and software help give visibility to IT, loss prevention, operations, and security teams at tens of thousands of locations in more than 40 countries.

    Solink processes over 500 million hours of video monthly, integrating security footage with critical business data from over 350 sources. AWS supports the infrastructure we rely on and Amazon S3 Metadata will take that further—delivering real-time insights that enhance our content management, from monitoring storage and usage to tracking real-time effects of customer configuration changes.

    Martin Soukup, Chief Technical Officer - Solink
  • Commvault

    Commvault is the gold standard in cyber resilience, helping more than 100,000 organizations keep data safe and businesses resilient and moving forward. Today, Commvault offers the only cyber resilience platform that combines the best data security and rapid recovery at enterprise scale across any workload, anywhere—at the lowest TCO.

    Amazon S3 has emerged as a leading cloud storage provider for various data types. Amazon S3 Metadata will enable vendors like Commvault to proactively help identify and safeguard sensitive information, while also helping to automate elements like data tiering, and enhance outcomes for our shared customers. S3 Metadata facilitates efficient data organization and helps streamline data discovery, allowing for detailed annotation of objects, which is crucial for cloud-first cyber resilience.

    Pranay Ahlawat, Chief Technology and AI Officer - Commvault
  • New Relic

    The New Relic Intelligent Observability Platform gives customers deep performance analytics for every part of your software environment. Customers can easily view and analyze massive amounts of data, and gain actionable insights in real-time.

    As a leader in observability, New Relic’s data engine processes approximately 1.3 exabytes of Amazon S3 data daily. S3 Metadata will accelerate our innovation by automatically generating rich object metadata, thereby simplifying data exploration needed by our teams to run product experiments and build proofs of concept, such as developing new metrics beneficial for our customers. S3 Metadata will reduce our effort to build and maintain a robust metadata system from hundreds of hours to just a few clicks in the S3 Management Console, enabling our engineers to focus on data analysis rather than data organization.

    Siva Padisetty, Chief Technology Officer - New Relic