AWS Storage Blog
Adapting to change with data patterns on AWS: Aggregate, curate, and extend
At AWS re:Invent, I do an Innovation Talk on the emerging data trends that shape the direction of cloud data strategies. Last year, I talked about Putting Your Data to Work with Generative AI, which not only covered how data is used with foundation models, but also how businesses should think about storing and classifying the new data sets that are AI-generated. This year, I focused on what I have been seeing AWS customers adopt as data patterns for the largest data lakes in the cloud (Modern Data Patterns for Modern Data Strategies). I am going to do a series of posts on these observations, starting first with this summary of the data patterns.
First, a bit of context about why this is so important for any company operating in the cloud. I have worked on Amazon S3 for over ten years now and we have seen how data drives innovation in any industry. Petabytes of data power the customer experiences behind NFL’s Next Gen Stats, AstraZeneca’s drug discovery, Pinterest’s visual search engine, and Netflix’s streaming business, among many other examples. You have heard the phrase “good data means good AI” a lot recently, but really the importance of good data is nothing new to any of us. Major technology shifts—such as the introduction of elastic cloud storage in Amazon S3 in 2006 and open-source innovations like Hadoop and online table formats (OTFs)—have made it easier than ever to use vast amounts of data for new application experiences and business operations. What hasn’t changed is that every modern business is a data business, and AWS is here to help with your data.
Data patterns as a mental model are important because they let you use data in the right way at the right time for your business. And, because you can mix-and-match data patterns to different parts of your organization with AWS, you can evolve your data strategy easily to shifts in technology or business.
There are three data patterns that AWS customers use: Aggregate, Curate, and Extend
Almost every cloud journey starts with the first step of Aggregate, which means bringing your different data sources together in Amazon S3 so that your application owners can take advantage of the diversity and depth of data in your business. This is a big change from on-premises data architectures. It moves companies away from expensive, vertically integrated data and compute, and lets you scale all your data at a massive rate separately from the compute that uses the data. Application owners can write and read the data in a federated model, using it for everything from fraud analytics to inference-driven knowledge bases that let you integrate proprietary business information into any business application.
While many of our customers use an Aggregate data pattern on S3 very successfully and at very high scale throughout their organizations, other customers also use a Curate data pattern. Think of it as a layer on top of Aggregate. In a Curate data pattern, application developers no longer have access to all the aggregated data. Instead, a centralized data organization creates a subset of curated high-quality data sets offered through internal or external data marketplaces, and application engineers use those curated data sets for building new experiences.
Sometimes AWS customers layer on that data product concept with the Extend data pattern which adds a data API. The API adds semantic meaning and new ways to interact with the data. In Extend, application developers or increasingly AI agentic systems interact with the data using an API that operates on a single data model, provides connectors and hooks for application extensibility, and standardizes how to collect and use data for applications.
Here’s a short, four-minute video that summarizes these three data patterns.
What is especially interesting about data patterns is the flexibility of picking and changing data patterns for your business. For example, I have seen more organizations that use shared data sets for AI and analytics shift from an Aggregate data platform to a Curate data pattern for their AI researchers and data engineers. This trend has really picked up in the last year.
Because data patterns are basically evolvable data strategies and AWS is constantly innovating to make it easier to store and use your data, you as a leader have the flexibility of using all three models as a mix and match based on the culture of your organization, the talent in your workforce and your business needs. For example, you might have Aggregate data pattern for your fraud team because they need access to lots of data sets. You might use Curate in your AI team because you want your AI researchers using pre-approved, clean data sets. And you might use Extend to give a data API to your marketing team because you can audit their usage of customer data for compliance reasons.
The flexibility of data patterns is the exact opposite of what you have on premises, where you get locked into vertical solutions that require major architecture changes to evolve. Don MacAskill, the founder and CEO of SmugMug, joined me on stage at re:Invent to talk about how he has used these data patterns for over 18 years. He started using Amazon S3 on the day we launched on March 14, 2006. As pretty much the first customer of Amazon S3, he does a great job walking through how he has used data to transform the experience of photography all over the world and his lessons along the way. It is a great view on evolvable data patterns and strategies over almost two decades using AWS.
Stay tuned for coming posts on the three data patterns that AWS customers use: Aggregate, Curate, and Extend.
This post is part of a series on adapting to change with data patterns on AWS: