MiQ Logo

MiQ Builds Cookieless Identity Graph Solution on AWS, Unifying 3–6 TB of Data Daily

2022

Identity graphs used to bring together consumer data from disparate sources into audience profiles are a critical workload in advertising that guides ad targeting, personalization, and measurement. Typically, third-party cookies are the key to mapping these profiles together, but programmatic media solutions company MiQ decided to build a more sustainable solution for its clients that was more appropriate for an evolving privacy-first world. The company created MiQ Identity Spine, a proprietary audience graph solution running on Amazon Web Services (AWS), to provide omnichannel insights that weren’t reliant on third-party cookies.

Analyzing the massive datasets in near real time requires an incredible amount of big data computing power, which is why MiQ turned to AWS to scale its operating environment. Using AWS, MiQ decreased its pipeline processing time and reduced associated costs. Now, MiQ can provide connected insights across multiple channels at the speed of change and is well positioned to adapt to evolving market regulations.

identity graph icon
kr_quotemark

The components of our data pipeline—including big data clusters, microservices, and databases—all use Amazon EC2 behind the scenes.”

Bikash Singh
Data Engineering Lead, MiQ

Unifying Data Views to Support Complex Insights

MiQ, a programmatic media partner for brands and agencies, uses identity signals collected across different marketing channels and applications to create a unified source of identity references for its clients. Its customers can then use these identifiers in the programmatic environment to match advertising opportunities with potential audience segments and analyze campaign performance. As demand grew for more complex insights about customers, MiQ’s analyst teams had to spend increasing amounts of time combining datasets across regional operating systems and platforms. After data privacy laws and cookie deprecation began to accelerate across the industry, the company decided to move toward a unified cloud computing system to manage the new complexity of future-proof identifiers.

To overcome these challenges, MiQ decided to merge its various computing platforms to create a unified space to analyze consumer profiles across multiple datasets. The company created MiQ Identity Spine, an identity graph that uses signals across multiple channels to create a unified source of reference that businesses can use for insights and activation. The MiQ Identity Spine schema maps together eight distinct data points for each profile, including identity values and types as well as geolocation data from over 150 unique datasets. “It reduces our reliance on any one identity dataset because we’re joining all our different identity signals together into one profile,” says Georgiana Haig, product lead for identity and future proofing at MiQ. “It also improves the accuracy of the mappings.”

The data ingest for MiQ Identity Spine is approximately 150–200 GB per day for each of the six core datasets that make up MiQ Identity Spine. Altogether, the company processes between 3–6 TB of data daily. Processing this increased amount of data, however, means that both the cost and the implementation time increased substantially. MiQ realized that it had an opportunity to optimize its pipeline using AWS. “We have been using AWS in our organization for years, and many of our workloads were already backed up by AWS services,” says Bikash Singh, data engineering lead at MiQ.

Optimizing Data Processing Pipelines on AWS

The number of profiles that MiQ works with varies week to week since MiQ Identity Spine is recalculated on a weekly basis. On average, the company manages 108 million individual profiles and 79 million household profiles. “The pipelines that power MiQ Identity Spine require huge data joins to be done behind the scenes,” says Singh. “In order to do this, we need huge processing capacity.” Since MiQ Identity Spine used the Apache Spark analytics engine, MiQ began using Amazon EMR, a cloud big data solution for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning applications using open-source analytics frameworks, such as Apache Spark, Apache Hive, and Presto. “Amazon EMR houses our advanced business intelligence infrastructure completely,” says Singh. MiQ upgraded to the latest version of Apache Spark on Amazon EMR and used adaptive query implementation to achieve a 66 percent decrease in processing time.

To provide compute power for the pipeline, MiQ selected Amazon Elastic Compute Cloud (Amazon EC2), which offers secure and resizable compute capacity for virtually any workload. “The components of our data pipeline—including big data clusters, microservices, and databases—all use Amazon EC2 behind the scenes,” says Singh. MiQ also uses Databricks, an open-source big data processing solution and an AWS Partner, to run various data pipelines. “Databricks launches clusters that use Amazon EC2 instances, and all the communications and resources Databricks requires are powered by AWS,” says Singh. “Managing identity and access management roles, using Amazon EC2, using storage—all of those work with Databricks.”

Using Amazon EC2, MiQ was able to take advantage of AWS Graviton processors, which are designed by AWS to deliver the best price performance for cloud workloads running in Amazon EC2. “We had been stuck in a phase where the cost of running the MiQ Identity Spine pipeline was becoming very high,” says Singh. “Using AWS Graviton processors helped us to achieve the same compute power in a much more cost-efficient manner.” The company has been able to reduce the monthly data processing cost for the pipeline by approximately 40 percent. “Using AWS has tremendously helped us in lowering the cost of processing our data pipelines,” says Singh.

Using Databricks and AWS helps MiQ save an additional 50 percent in terms of data processing costs and runtime. The total processing time for the solution used to range from 3–4 hours, but MiQ has reduced that time to less than 1 hour. And using MiQ Identity Spine, MiQ has saved an estimated 1 year of analyst time per year. “With Databricks on AWS, we get proper deployment of our infrastructure,” says Singh. “And using AWS services, such as Amazon EMR and Amazon EC2, we get the resources needed to run the data pipeline cost effectively and with high performance. Using both together has proven to be a better, highly optimized infrastructure for our data pipelines.”

Building Future-Proof Infrastructure for Identity Graphs

MiQ hopes to continue to find new ways to sustainably scale resources and looks forward to using AWS services as part of its modernization. “This has really been a learning experience for us,” Singh says. “It opened our thinking and opened our mindset to using other managed services from AWS.” MiQ’s goal is to connect more future-proof identifiers using MiQ Identity Spine. “Expanding beyond user-level data would involve more datasets and more joins,” says Haig. By hosting its pipeline on AWS, MiQ feels confident that it is well equipped to meet future challenges. “Much of what we do is built on AWS,” says Haig. “It’s fundamental to our infrastructure and to achieving our goals.”


About MiQ

MiQ is a programmatic media partner for marketers and agencies. The company’s goal is to maximize the value of client data to provide more actionable insights. MiQ has 18 offices located in North America, Europe, and APAC.

Benefits of AWS

  • Ingests 150–200 GB per day for each of six core datasets
  • Processes between 3–6 TB of data daily
  • Reduced data processing costs by 40% using AWS Graviton processors
  • Reduced data processing costs and runtime by an additional 50%
  • Decreased processing time by 66% using Apache Spark on Amazon EMR
  • Reduced processing time from 3–4 hours to less than 1 hour

AWS Services Used

Amazon EMR

Amazon EMR is a cloud big data platform for running large-scale distributed data processing jobs, interactive SQL queries, and machine learning (ML) applications using open-source analytics frameworks such as Apache Spark, Apache Hive, and Presto.

Learn more »

Amazon EC2

Amazon Elastic Compute Cloud (Amazon EC2) offers the broadest and deepest compute platform, with over 500 instances and choice of the latest processor, storage, networking, operating system, and purchase model to help you best match the needs of your workload.

Learn more »

AWS Graviton Processor

AWS Graviton processors are designed by AWS to deliver the best price performance for your cloud workloads running in Amazon EC2.

Learn more »


Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.