We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.
If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”
Customize cookie preferences
We use cookies and similar tools (collectively, "cookies") for the following purposes.
Essential
Essential cookies are necessary to provide our site and services and cannot be deactivated. They are usually set in response to your actions on the site, such as setting your privacy preferences, signing in, or filling in forms.
Performance
Performance cookies provide anonymous statistics about how customers navigate our site so we can improve site experience and performance. Approved third parties may perform analytics on our behalf, but they cannot use the data for their own purposes.
Allowed
Functional
Functional cookies help us provide useful site features, remember your preferences, and display relevant content. Approved third parties may set these cookies to provide certain site features. If you do not allow these cookies, then some or all of these services may not function properly.
Allowed
Advertising
Advertising cookies may be set through our site by us or our advertising partners and help us deliver relevant marketing content. If you do not allow these cookies, you will experience less relevant advertising.
Allowed
Blocking some types of cookies may impact your experience of our sites. You may review and change your choices at any time by selecting Cookie preferences in the footer of this site. We and selected third-parties use cookies or similar technologies as specified in the AWS Cookie Notice.
Your privacy choices
We display ads relevant to your interests on AWS sites and on other properties, including cross-context behavioral advertising. Cross-context behavioral advertising uses data from one site or app to advertise to you on a different company’s site or app.
To not allow AWS cross-context behavioral advertising based on cookies or similar technologies, select “Don't allow” and “Save privacy choices” below, or visit an AWS site with a legally-recognized decline signal enabled, such as the Global Privacy Control. If you delete your cookies or visit this site from a different browser or device, you will need to make your selection again. For more information about cookies and how we use them, please read our AWS Cookie Notice.
Remove time from manual entry of data attributes in the data catalog, which also introduces potential errors. Generate business context and recommend analysis for datasets, which boosts data discovery results. Understand where your data came from, and which sources will be impacted by changes. More, richer data in the business data catalog also improves the search experience. Reduce your time searching for and using data from weeks to days.
The Amazon DataZone business data catalog acts as a federated organizational registry where technical metadata can be published as assets, and you can add enriched business context. You can make data visible with business context for all your users to find, understand, and trust data quickly and easily.
Automated metadata recommendations
Automate adding business descriptions and names to data, which helps you easily understand context and helps you avoid dealing with cryptic technical names. This automation is powered by large language models (LLMs) to increase accuracy and consistency.
Faceted search
Faceted search works on top of the business data catalog to help data consumers and producers find data assets using familiar structural information, such as table and column names, as well as business terms.
Recommendations for relevant columns and analytical applications
For each dataset, generate a list of the most valuable columns and the likely analytics uses.
Data quality
With data quality statistics in Amazon DataZone, data consumers can see data quality metrics from AWS Glue data quality or third-party systems. Data consumers can trust the data sources they use for decisions, and have data quality context as they search for assets. Producers and IT teams can also use APIs to incorporate the data quality statistics from third-party systems into a unified, out-of-console portal. Data producers can bring in AWS Glue data quality results on a schedule to make sure that the scores are current, even as the data continues to change.
Data lineage (preview)
Understand the movement of data over time. Data lineage can raise trust and an organization’s data literacy by helping data consumers understand where data came from, how it changed, and its consumption. You can reduce time spent in mapping a data asset and its relationships, troubleshooting and developing pipelines, and asserting data governance practices.
Data product
Group data assets into defined packages (data products) tailored for specific business use cases to streamline cataloging and enable data consumers to easily discover and subscribe to the data. Data producers can curate a collection of relevant assets, add business context, and publish it as a data product unit. This simplifies the process for data consumers to locate all necessary data assets for particular use cases. Consumers can subscribe to all assets within a data product through a single approval workflow. Data producers can manage the product's lifecycle, including editing the asset collection, unpublishing, deleting it, and maintaining subscriptions. Amazon DataZone also offers API support for data product workflows, facilitating integration and automation.
Use cases
Find the right trusted data
Reduce your time to insights by finding the right data, in the right context. Data can be trusted only when it is consistent, accurate, complete, timely, traceable, and has a transparent data quality score. With distributed ownership, each department or the analytics team maintains the fidelity of assets so that data consumers know that they are using the right data.
Build a business data catalog
Build a business data catalog by crawling your assets and bringing in the technical metadata (not the actual data) to enrich with business context. The business context can be enriched with standardized glossaries and terms. You can also customize additional metadata with metadata forms.
Understand the data context
Using the right data requires understanding the data context. Amazon DataZone helps build that context for all the data that is catalogued with glossaries and metadata forms. Now, the data owner can share as much information as possible to set the data context for the data consumer to find, understand, and then subscribe to data. The data quality score helps data consumers understand if a data asset is fit-for-purpose.
Understand data origin, transformations, and consumption
Reduce spending time mapping data assets and their relationships, troubleshooting and developing pipelines, and asserting data governance practices. Through a graphical experience, data consumers understand the asset’s origin. Data producers can assess the effect of changes on a table or column by understanding which systems or data consumers use the data (impact analysis). Data producers can also troubleshoot data issues by reviewing snapshots of a data asset’s lineage to spot the error source. Amazon DataZone visualizes data lineage captured from OpenLineage events, an open standard for lineage collection, but can also capture custom lineage mappings. The lineage helps data producers to include data lineage while sharing the data, which increases trust in the data sources.
Videos
AWS re:Invent 2023 - How to build a business catalog with Amazon DataZone (21:37)
AWS re:Invent 2023 - Understand your data with business context (55:40)
FAQs
What kind of information is in the Amazon DataZone business data catalog?
In the Amazon DataZone business data catalog, business metadata provides information authored or used by business people and gives context to organizational data. This could include the following information:
Ownership: Modern data-centric organizations employ a distributed data stewardship process where lines of business (LOBs) are responsible for managing their own data. A catalog tracks that ownership so interested parties can find and request access to data as part of their business tasks.
Classification: Data discovery is a key task that business metadata can support. Data discovery uses centrally defined corporate ontologies and taxonomies to classify data sources and helps you find relevant data objects.
Relationships: You can use the Amazon DataZone business data catalog to add relationship information as metadata. As with a technical dataset schema, the business data catalog shows relationships between objects in the catalog, such as those between databases, datasets, and their columns.
Schema: AI recommendations for descriptions can use the technical and business schema to generated recommended descriptions and usage for data.
Origin and consumption: Data lineage and impact analysis, as well as custom mappings from OpenLineage, are linked to in the business data catalog.
What can I catalog with Amazon DataZone?
Amazon DataZone supports data assets published directly from the AWS Glue Data Catalog and Amazon Redshift. These two sources can be used to catalog data in the following locations:
Amazon Simple Storage Service (Amazon S3) data lakes