AWS Big Data Blog

Category: AWS Lake Formation

Accelerate queries on Apache Iceberg tables through AWS Glue auto compaction

In this post, we explore new features of the AWS Glue Data Catalog, which now supports improved automatic compaction of Iceberg tables for streaming data, making it straightforward for you to keep your transactional data lakes consistently performant. Enabling automatic compaction on Iceberg tables reduces metadata overhead on your Iceberg tables and improves query performance

Enforce fine-grained access control on data lake tables using AWS Glue 5.0 integrated with AWS Lake Formation

AWS Glue 5.0 supports fine-grained access control (FGAC) based on your policies defined in AWS Lake Formation. FGAC enables you to granularly control access to your data lake resources at the table, column, and row levels. This post demonstrates how to enforce FGAC on AWS Glue 5.0 through Lake Formation permissions.

Read and write S3 Iceberg table using AWS Glue Iceberg Rest Catalog from Open Source Apache Spark

In this post, we will explore how to harness the power of Open source Apache Spark and configure a third-party engine to work with AWS Glue Iceberg REST Catalog. The post will include details on how to perform read/write data operations against Amazon S3 tables with AWS Lake Formation managing metadata and underlying data access using temporary credential vending.

How ANZ Institutional Division built a federated data platform to enable their domain teams to build data products to support business outcomes

ANZ Institutional Division has transformed its data management approach by implementing a federated data platform based on data mesh principles. This shift aims to unlock untapped data potential, improve operational efficiency, and increase agility. The new strategy empowers domain teams to create and manage their own data products, treating data as a valuable asset rather than a byproduct. This post explores how the shift to a data product mindset is being implemented, the challenges faced, and the early wins that are shaping the future of data management in the Institutional Division.

Integrate custom applications with AWS Lake Formation – Part 1

In this two-part series, we show how to integrate custom applications or data processing engines with Lake Formation using the third-party services integration feature. In this post, we dive deep into the required Lake Formation and AWS Glue APIs. We walk through the steps to enforce Lake Formation policies within custom data applications. As an example, we present a sample Lake Formation integrated application implemented using AWS Lambda.

Integrate custom applications with AWS Lake Formation – Part 2

In this two-part series, we show how to integrate custom applications or data processing engines with Lake Formation using the third-party services integration feature. In this post, we explore how to deploy a fully functional web client application, built with JavaScript/React through AWS Amplify (Gen 1), that uses the same Lambda function as the backend. The provisioned web application provides a user-friendly and intuitive way to view the Lake Formation policies that have been enforced.

Fine-grained access control in Amazon EMR Serverless with AWS Lake Formation

In this post, we discuss how to implement fine-grained access control in EMR Serverless using Lake Formation. With this integration, organizations can achieve better scalability, flexibility, and cost-efficiency in their data operations, ultimately driving more value from their data assets.

How BMW streamlined data access using AWS Lake Formation fine-grained access control

This post explores how BMW implemented AWS Lake Formation’s fine-grained access control (FGAC) in the Cloud Data Hub and how this saves them up to 25% on compute and storage costs. By using AWS Lake Formation fine-grained access control capabilities, BMW has transparently implemented finer data access management within the Cloud Data Hub. The integration of Lake Formation has enabled data stewards to scope and grant granular access to specific subsets of data, reducing costly data duplication.

Deprecation of Lake Formation’s Governed Tables Feature

After careful consideration, we have made the decision to end support for Governed Tables, effective December 31, 2024, to focus on open source transactional table formats such as Apache Iceberg, Apache Hudi, and Linux Foundation Delta Lake. This decision stems from customer preference for these open source solutions, which offer ACID-compliant transactions, compaction, time travel, and other features previously provided by Governed Tables.

Apply enterprise data governance and management using AWS Lake Formation and AWS IAM Identity Center

In this post, we explore a solution using AWS Lake Formation and AWS IAM Identity Center to address the complex challenges of managing and governing legacy data during digital transformation. We demonstrate how enterprises can effectively preserve historical data while enforcing compliance and maintaining user entitlements. This solution enables your organization to maintain robust audit trails, enforce governance controls, and provide secure, role-based access to data.