Discover Accelerates Analytics and Time-to-Insights Using AWS

2020

Discover Financial Services provides banking and credit products to help customers achieve their financial goals, such as establishing good credit, paying for a college education, and consolidating debt. The company relies heavily on data and analytics both internally and externally to deliver on that promise and distinguish itself in an ultra-competitive industry. “We have a lot of customer data,” says Brandon Harris, director of data science technology at Discover Financial Services. “We need to use that data as a differentiator to continually provide customers with a better experience.”

Over the years, individual analytics practices sprang up within Discover’s teams and business units. In all, there were about 8–10 tool sets across 12 teams. Each practice required different skill sets and diverse tools. Discover’s leadership team believed bringing those practices and teams together could improve analytics and create consistent tools across the organization.

Discover’s technology team commonly builds new solutions internally, preferring to control the end-to-end technologies and to manage its own data centers. Harris and his team were tasked with creating a centralized platform that would allow the company’s data scientists to collaborate in a common environment, an internal data science workbench called Air9.

Discover Financial Services Creates Environments Where Data Scientists Can Collaborate
kr_quotemark

Amazon EFS fit the bill well as far as scalability and costs, and due to some great work from the Kubernetes community, there were already storage-class capabilities around the service."

Brandon Harris
Director of Data Science Technology, Discover Financial Services

Building a Cloud-Native Data Science Platform

One of the first design principles Harris’s team agreed on for Air9 was strength in diversity. “Not only in the diversity of the teams and their experiences, but also varied approaches and tools,” says Harris. “We weren’t going to deliver a one-size-fits-all approach to data science for this well-established analytics community.”

Harris’s team determined that Kubernetes was a good fit to host Air9 because many of the data science tools the company already used naturally lent themselves to containerization. Having dedicated containers would allow for isolated workloads and enable users to install custom packages and make changes to their environments that would be difficult to manage in a multi-tenant environment. Because Discover is a long-time customer of Amazon Web Services (AWS) and user of Amazon Simple Storage Service (Amazon S3), the team also decided to deploy Amazon Elastic Compute Cloud (Amazon EC2) instances. Using this approach, some 883 data scientists across multiple countries can now choose their Amazon EC2 instance size, type, and quantity and have the application auto-mount that instance for their datasets.

Improving Scalability, Storage, and Cost with AWS

A shared storage capability with fully managed, cloud-native file storage was another critical component of Air9. “If you have all these different environments running, there needs to be a common way to save data and collaborate,” says Harris.

However, the project hit a snag when the Discover team began to design the storage layer. “Our analytics teams had some very large datasets in our cloud data warehouse, but we had to plan for them to have local storage for their own work, as well as a mechanism to share data among and across teams,” says Harris. “This storage layer also had to be very resilient and support significant growth over time.”

Harris and his team set out to leverage an open-source distributed storage solution as its data science platform’s storage layer, but running and managing it soon became expensive and time-intensive. “When we saw the monthly costs associated with running our own storage platform exceeding the compute costs, we knew something was wrong,” says Harris. “Ultimately, the excess cost was attributed to the replication factor for distributed storage, but the tradeoff for reducing cost—decreasing the replication factor—wasn’t one we were comfortable making.”

Because of the team’s success with Amazon EC2 on the compute side of the platform, it reviewed AWS managed services for storage and chose to deploy Amazon Elastic File System (Amazon EFS). Harris says, “Amazon EFS fit the bill well as far as scalability and costs, and due to some great work from the Kubernetes community, there were already storage-class capabilities around the service. AWS also enabled us to use different environments for different types of data, so we could better protect more sensitive types of data.”

Previously, each team had a home directory and team directory. By taking advantage of Amazon EFS, the company could easily provide shared access across data science tools, projects, and datasets for more seamless collaboration. Long-term data archiving capabilities coupled with the low overhead costs of Amazon S3 also meant Discover could customize backup processes so it would have a second copy of data available for safekeeping.

“We use Amazon EFS as that collaboration layer, but we also have an archive and a historical layer for different datasets or for lifecycle management purposes,” says Harris. “We need to keep certain sets of data for a specified number of years. Amazon S3 and the Amazon S3 Glacier storage class have been helpful in making sure we can cost-effectively store all the data being created and used by our data scientists.”

Improving Collaboration and Time-to-Insights

Today, Air9 boosts the productivity and efficiency of Discover’s data scientists by enabling them to run analytics applications in one central location on AWS; to collaborate in a shared storage environment, leveraging structured and unstructured data sources; and to process and store data from multiple sources. This allows Discover’s data scientists to analyze data for insights more quickly and easily.

The previous data platform required weeks to upgrade, primarily due to storage constraints and the need to resize and grow the old storage clusters when additional storage was required. Because Amazon EFS does all that behind the scenes, the team can now update the data platform in hours. The platform also enables self-service, helping data scientists remain productive without impacting their colleagues’ experience. “With our previous on-premises environment there was no mechanism to facilitate these conversations and interactions between our data scientists,” says Harris.

Using the AWS solution, Harris estimates his team has reduced the amount of time it spends managing storage by 90 percent. And by relying on AWS to manage the service and provide the redundancy capability rather than having to architect and build it internally, Discover has reduced costs by 50–60 percent.

These changes are also helping advance Discover’s overall digital transformation efforts. “It used to take weeks to get users the tools they needed to do their jobs,” says Harris. “Now we can do it in hours so they can start gleaning insights and delivering value for our customers almost immediately.”

To learn more, visit aws.amazon.com/efs.

Reference architectures

Reference architectures


About Discover Financial Services

Discover Financial Services is a digital banking and payment services company. Founded in 1985 and headquartered north of Chicago, the company’s mission is to help people spend smarter, manage debt better, and save more.

Benefits of AWS

  • Cuts storage management time by 90% and costs by 50–60%
  • Scales compute and storage on-demand
  • Shared storage enables data scientists to collaborate more
  • Customizes backup processes thanks to unlimited storage
  • Updates data platform in hours not weeks
  • Data scientists can focus on insights instead of technology

AWS Services Used

Amazon Elastic File System

Amazon Elastic File System (Amazon EFS) provides a simple, scalable, fully managed elastic NFS file system for use with AWS Cloud services and on-premises resources. It is built to scale on demand to petabytes without disrupting applications, growing and shrinking automatically as you add and remove files, eliminating the need to provision and manage capacity to accommodate growth.

Learn more »

Amazon S3

Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.

Learn more »

Amazon S3 Glacier & S3 Glacier Deep Archive

Amazon S3 Glacier and S3 Glacier Deep Archive are a secure, durable, and extremely low-cost Amazon S3 cloud storage classes for data archiving and long-term backup.

Learn more »

Amazon EC2

Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers. Amazon EC2’s simple web service interface allows you to obtain and configure capacity with minimal friction.

Learn more »


Get Started

Organizations of all sizes across all industries are transforming and delivering on their missions every day using AWS. Contact our experts and start your own AWS Cloud journey today.