Amazon S3 Tables

Optimize query performance and cost as your data lake scales

Store tabular data at scale in S3

Amazon S3 Tables deliver the first cloud object store with built-in Apache Iceberg support and streamline storing tabular data at scale. S3 Tables deliver up to 3x faster query performance and up to 10x higher transactions per second compared to self-managed Iceberg tables stored in general purpose S3 buckets, making them specifically optimized for analytics workloads. With S3 Tables support for the Apache Iceberg standard, your tabular data can be easily queried with popular AWS and third-party query engines including Amazon Athena, Redshift, EMR, and Apache Spark. Use S3 Tables to store tabular data such as daily purchase transactions, streaming sensor data, or ad impressions as an Iceberg table in S3, and optimize performance and cost as your data evolves using automatic table maintenance. Read more on the blog

Benefits

Simplify data lakes at any scale, whether you’re just getting started or managing thousands of tables in your Iceberg environment.

Get up to 3x faster query performance and up to 10x higher transactions per second compared to storing Iceberg tables in general purpose S3 buckets.

Perform continual table maintenance tasks such as compaction, snapshot management, and unreferenced file removal to automatically optimize query efficiency and costs over time.

Access advanced Iceberg analytics capabilities and query data using familiar AWS services like Amazon Athena, Redshift, and EMR through the S3 Tables preview integration with AWS Glue Data Catalog. S3 Tables is compatible with popular open source tools.

Create tables as first-class AWS resources and apply permissions to easily govern access to them.

How it works

S3 Tables provide purpose-built S3 storage for storing structured data in the Apache Parquet format. Within a table bucket, you can create tables as first-class resources directly in S3. These tables can be secured with table-level permissions defined in either identity- or resource-based policies and are accessible by applications or tooling that supports the Apache Iceberg standard. When you create a table in your table bucket, the underlying data in S3 is stored as Parquet data. Then, S3 maintains the metadata necessary to make that Parquet data queryable by your applications. Table buckets include a client library that is used by query engines to navigate and update the Iceberg metadata of tables in your table bucket. This library, in conjunction with updated S3 APIs for table operations, allows for multiple clients to safely read and write data to your tables. Over time, S3 automatically optimizes the underlying Parquet data by rewriting, or "compacting” your objects. Compaction optimizes your data on S3 to improve query performance and minimize costs. Read the user guide to learn more

Customers

  • Genesys

    Genesys is a global cloud leader in AI-Powered Experience Orchestration. Through advanced AI, digital and workforce engagement management capabilities, Genesys helps more than 8,000 organizations in over 100 countries to provide personalized, empathetic customer and employee experiences while benefiting from improved business agility and outcomes.

    Amazon S3 Tables will be a transformative addition to our data architecture, especially with its managed Iceberg support, which effectively creates a materialized view layer for diverse data analysis needs. This offering has the potential to help Genesys simplify complex data workflows by eliminating extra layers of table management, with S3 handling key maintenance tasks like compaction, snapshot management, and unreferenced file cleanup automatically. The ability to read and write Iceberg Tables directly from S3 will help us boost performance and create new possibilities for integrating data seamlessly across our analytics ecosystem. This interoperability, combined with the performance enhancements, positions S3 Tables as a pivotal part of our future strategy to deliver fast, flexible, and reliable data insights.

    Glenn Nethercutt, Chief Technology Officer - Genesys
  • SnapLogic

    SnapLogic is a pioneer in AI-led integration. The SnapLogic Platform for Generative Integration accelerates digital transformation across the enterprise to design, deploy, and manage AI agents and integration that automate tasks, make real-time decisions, and integrate effortlessly into existing workflows.

    Amazon S3 Tables, with built-in Apache Iceberg support and AWS Analytics services integration, help companies optimize their data analytics costs while transforming how they use business data for analytics, compliance, and AI initiatives. By automating complex data management tasks and providing complete audit trails of data changes, teams can instantly analyze historical data, maintain regulatory compliance, and accelerate business insights while significantly reducing their technology costs.

    Dominic Wellington, Enterprise Architect - SnapLogic
  • Zus Health

    Zus is a shared health data platform designed to accelerate healthcare data interoperability by providing easy-to-use patient data via API, embedded components, and direct EHR integrations.

    As a healthcare company handling massive amounts of frequently changing patient data, we decided to invest in Apache Iceberg because it solves many pain points with Apache Hive around partitioning and automation, with the added benefit of wider interoperability. One of our biggest challenges with Iceberg has been understanding and managing table optimization. This is why we’re excited about S3 Tables and the managed optimization capabilities. Being able to offload the developer overhead of table maintenance will allow us to focus more on bringing high-quality data and valuable insights to our customers.

    Sonya Huang, Consulting Software Engineer - Zus Health