Amazon SageMaker Lakehouse features

Page topics

General

General

Access and query your data in-place, with Apache Iceberg–compatible tools and engines of your choice. Run analytics and ML use cases ranging from Apache Spark ETL jobs to SQL dashboards, machine learning (ML) personalization models, and generative AI applications using your preferred Apache Iceberg–compatible engines and tools.

Get the flexibility of a data lake and performance of a data warehouse, without changing your existing data architecture. Access highly optimized Amazon Redshift storage and secondary data structures, such as materialized views, to speed up SQL analytics in your data lakes.

Run analytic tools and engines of your choice, such as SQL, Apache Spark, business intelligence (BI), and AI/ML tools, on a single copy of data, while storing data in a format best suited for your workloads.

With Apache Iceberg compatibility, all data in the SageMaker Lakehouse are fully ACID (Atomic, Consistent, Isolated, Durable) compliant for high performance SQL analytics.

Run federated queries on data stored across multiple third-party sources to access and query your data in-place.

Bring data from your operational databases such as Amazon DynamoDB, Amazon Aurora MySQL, Amazon Aurora PostgreSQL, Amazon RDS for MySQL and applications including Salesforce, ServiceNow, and Zendesk to SageMaker Lakehouse using zero-ETL integrations for near real-time analytics.

Secure your data in SageMaker Lakehouse with integrated access controls. Define permissions once and these permissions get enforced across all your data in all analytic tools and engines.