Amazon SageMaker Catalog

Discover, govern, and collaborate on data and AI securely

Overview

The next generation of Amazon SageMaker simplifies the discovery, governance, and collaboration for data and AI across your lakehouse, AI models, and applications. With Amazon SageMaker Catalog, built on Amazon DataZone, users can securely discover and access approved data and models using semantic search with generative AI–created metadata, or you could just ask Amazon Q Developer with natural language to find your data. Users can consistently define and enforce access policies using a single permission model with fine-grained access controls centrally in the Amazon SageMaker Unified Studio. Seamlessly share and collaborate on data and AI assets through easy publishing and subscribing workflows. With SageMaker, you can safeguard and protect your AI models using Amazon Bedrock Guardrails and implement responsible AI policies. Build trust throughout your organization with data quality monitoring and automation, sensitive data detection, and data and machine learning (ML) lineage.

See Amazon SageMaker Catalog in action

thumbnail sagemakercatalog

Benefits

Discover your data and AI assets at scale with SageMaker Catalog, built on Amazon Datazone. Enhance data discovery with generative AI to automatically enrich your data and metadata with business context, making it easier for all users to find, understand, and use data. Share your data, AI models, prompts, and generative AI assets with filtering by table and column names or business glossary terms. Automatically recommend valuable columns and relevant analytical applications for each dataset, enabling the use of the right data to quickly build the right models. Support both centralized and decentralized governance models with seamless data and AI sharing through publishing and subscribing workflows in a single experience through Projects.

Gain trust through real-time visibility of data quality and data and ML lineage in SageMaker. Automate data profiling and data quality recommendations, monitor data quality rules, and receive alerts. Resolve hard-to-find data quality challenges by using rule-based and ML approaches to reconcile entities so you can deliver high-quality data to make confident business decisions. Drive transparency in data pipelines and AI projects with built-in model monitoring to detect bias or report on how features contribute to your model prediction.
Centralize data and AI security in SageMaker with fine-grained access controls, data classification, and guardrails to ensure data, analytics, and AI models are appropriately used. Define permissions once, and enforce them across data and models. With Amazon Bedrock natively integrated, customers can use Amazon Bedrock Guardrails in their generative AI application by blocking harmful content, filtering hallucinations, and enabling customizable safeguards for privacy, safety, and accuracy. Automatically identify sensitive information within your pipelines using Amazon Comprehend.
Meet audit and regulatory compliance with data usage and model logging and monitoring. Support acceptable use of your analytics and AI assets across your enterprise with project-based isolation. Understand data and model usage across your lakehouse for enhanced security. Use Amazon SageMaker Clarify to monitor models for bias, accuracy, and robustness, aligning with your responsible AI standards. Align costs to business initiatives and provides a clear view of your business investments.

Features

Curated data for context and findability

The SageMaker Catalog brings business context to your technical metadata and allows you to enrich it with business context. You can make data visible with business context for all your users to quickly and easily find, understand, and trust data.

Automated metadata recommendations

Automate adding business descriptions and names to data, which helps you easily understand context and avoid dealing with cryptic technical names. This automation is powered by large language models (LLMs) to increase accuracy and consistency.

Bring a consistent level of AI safety across all your applications

Amazon Bedrock guardrails help evaluate user inputs and foundation models (FMs) responses based on use case–specific policies, and provides an additional layer of safeguards regardless of the underlying FMs.

Quickly audit and track models

Quickly audit and troubleshoot performance for all models, endpoints, and model-monitoring jobs through a unified view. Track deviations from expected model behavior and missing or inactive monitoring jobs with automated alerts.

Data quality

With data quality statistics, data consumers can see data quality metrics from AWS or third-party systems. Data consumers can trust the data sources they use for decisions and have data quality context as they search for assets. Data producers and IT teams can also use APIs to incorporate data quality statistics from third-party systems into a unified, out-of-console portal.

Data and ML lineage

Understand the movement of data and models over time. Lineage can raise trust and an organization’s data and AI literacy by helping data consumers understand where data came from, how it changed, and its consumption. You can reduce time spent in mapping data and AI assets and their relationships, troubleshooting and developing pipelines, and asserting data and AI governance practices.

Customers

CISCO

"You want to discover, share, and govern your data. Whether you call it a data mesh or a data fabric, data exists across different teams in multiple silos, and you need a way to bring it together. Amazon SageMaker Catalog connects data producers and consumers, enabling producers to share data with built-in controls and data contracts while allowing consumers to access the data using the tools of their choice"

Shaja Arul Selvamani, Sr. Director AI/ML, Cisco

image

Natera, Inc.

"Our organization has been leveraging Amazon DataZone, Amazon SageMaker AI, Amazon Athena, and Amazon Redshift to manage and analyze our clinical and genomic data. We are excited to now have the unified governance of the Amazon SageMaker Catalog, which will streamline our data discovery and access, enabling our team to quickly analyze relevant data across our whole domain. This integration will help us create tailored datasets, potentially reducing our time-to-insight, and ultimately drive improved patient outcomes as we advance our goal of making personalized genetic testing a standard part of care."

Mirko Buholzer, VP of Software Engineering, Natera, Inc.

image

NatWest

"Our Data Platform Engineering team has been deploying multiple end-user tools for data engineering, ML, SQL, and gen AI tasks. As we look to simplify processes across the bank, we’ve been looking at streamlining user authentication and data access authorization. Amazon SageMaker delivers a ready-made user experience to help us deploy one single environment across the organization, reducing the time required for our data users to access new tools by around 50%."

Zachery Anderson, CDAO, NatWest Group

image