Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

Posted on: Sep 12, 2024

AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables by automatically removing data files that are no longer needed. Each write to an Iceberg table creates a new snapshot, or version, of a table. In addition, any failures during writing to Iceberg tables will create data files that aren't referenced in snapshots known as “orphan” files that further increases storage costs. AWS Glue catalog’s new storage optimizations along with automated compaction will help you reduce metadata overhead, control storage costs and improve query performance.

With this launch, you can enable AWS Glue catalog table optimization to include snapshot and orphan data management. You can optimize Amazon S3 layout by providing configuration such as default retention period and days to keep orphan files. Once enabled, AWS Glue catalog periodically monitors tables, removes snapshots from table metadata, removes the Amazon S3 data files, and orphan files that are no longer needed. You can view history of number of data, manifest, manifest lists and orphan files deleted from the table optimization tab in the Glue catalog console.

In addition to the AWS console, customers can also use AWS CLI or AWS SDKs to enable table optimization of Apache Iceberg tables. Automatic optimization for Iceberg tables is available in 13 AWS regions US East (N. Virginia, Ohio), US West (Oregon), Europe (Ireland, London, Frankfurt, Stockholm), Asia Pacific (Tokyo, Seoul, Mumbai, Singapore, Sydney), South America (São Paulo). To learn more, read the blog, and visit the AWS Glue Data Catalog documentation.

Select your cookie preferences

AWS Glue Data Catalog now supports storage optimization of Apache Iceberg tables

Ending Support for Internet Explorer