AWS Database Blog
How Scopely scaled “Stumble Guys” for millions of players around the globe with Amazon RDS for SQL Server
This is a guest post co-written with RJ Petroff, CEO and Founder of CloudBasix and Ian Monge Perez, Principal Engineer at Scopely.
Scopely is a global games developer, operator, and publisher with operations across North America, Central America, EMEA, and Asia, and additional studio partners spanning four continents. Over the past year, Scopely has served more than 500 million players with major titles such as “MONOPOLY GO!,” “Stumble Guys,” “MARVEL Strike Force,” “Star Trek Fleet Command,” and “Scrabble GO.”
In this post, we showcase how Scopely used CloudBasix to enable migration of “Stumble Guys” high-volume backend transactional databases with minimal downtime from Azure SQL database to Amazon Relational Database Service (Amazon RDS) for Server. Millions of gamers from around the world continued to play their favorite game without realizing the scale of what took place on the day of the cutover from Azure to Amazon Web Services (AWS). We also briefly discuss the challenges faced during the online migration of our gaming workload and how we addressed those challenges through a collaborative effort with our partner, CloudBasix, and the AWS team.
Challenges
Although the hosted database model we initially used to power the game performed well past our initial player count estimates, we gradually started to experience lack of adequate support and confidence in our ability to seamlessly scale up. We recognized that a ceiling was approaching where we could no longer effectively scale our resources to meet demand. After carefully evaluating our options, we made the strategic decision to migrate our gaming workload from Azure SQL Database to the AWS Cloud, using Amazon RDS for SQL Server.
Seeing millions of players enjoy your game daily is an incredibly rewarding experience. However, our gaming platform was generating billions of daily database changes, posing significant challenges for the cross-cloud migration from Azure SQL Database to RDS for SQL Server. Deploying a replication solution capable of handling this high-volume change load while facilitating a seamless transition was a key obstacle we had to address.
During the initial assessment of migration options, the team also discovered the following conditions:
- Azure SQL Database BACPAC export and restore to RDS for SQL Server wasn’t feasible since database size was 2 TB. Restoring bacpac of database size more than 250 GB is not supported on RDS for SQL Server.
- Migrating using Azure SQL Database’s built-in change data capture (CDC) mechanism caused significant storage overhead and log bloating issues, making it challenging for high-volume transactional database that was seeing billions of changes per day
Solution overview
The team of AWS experts assigned to oversee Scopely’s high profile migration case recommended a product in AWS Marketplace by technology partner CloudBasix, which has a proven record of allocating resources to work side-by-side with AWS teams. After evaluating multiple options, Scopely ultimately decided to go with CloudBasix InterCloud (SQL Server edition) in AWS Marketplace, recommended by our designated advisory support team from AWS.
The following diagram shows the solution architecture.
Scopely deployed a CloudBasix InterCloud cluster within their Amazon Virtual Private Cloud (Amazon VPC) connection. Continuous data replication was established between the Azure SQL databases and the new RDS for SQL Server instances in encrypted manner as documented in Encrypting Data In Transit. This allowed Scopely’s DevOps team to thoroughly validate data integrity and test the game’s applications in the AWS environment across mobile, desktop, console, and backend systems.
Migration process
The initial full load of data replication was deployed through the fully automated initial seeding mechanism. The following screenshot shows a primary job seeding for database tables with lower volume of changes was performed, followed by seeding tables with a high volume of changes in secondary jobs, which allowed us to partition the job with the goal to fine-tune the replication process by table groups.
During preliminary testing in lower environments, CloudBasix InterCloud handled replication of entire Azure SQL databases in a single replication job. However, under default settings auto-configured during the fully automated initial seeding step, we found that a subset of tables receiving a high volume of changes were causing the initial seeding process to take a long time and lead to increased overall latency during continuous tracking of changes. To address this issue, we partitioned the primary job into logical secondary jobs. Tables with a lower volume of changes remained in the primary job, and tables receiving a high volume of changes were either seeded initially directly in secondary jobs or moved to secondary jobs after initial seeding. As a result, we managed to substantially shorten the initial full load of data and adjust replication parameters on a table group partition level, which resulted in lowering overall latency. The following image of the CloudBasix Detailed Dashboard shows CloudBasix maintained continuous replication, keeping source Azure SQL databases and target RDS for SQL Server replicas in sync over time. This allowed Scopely’s DevOps team to validate databases’ referential integrity and test the “Stumble Guys” mobile, desktop, console, and backend applications within the AWS environment.
Final cutover
On the scheduled cutover day, web traffic was gradually redirected to AWS, with CloudBasix’s promote-to-primary operation seamlessly failing over to the AWS replicas with minimal disruption. Handling the massive scale of “Stumble Guys”—with millions of daily active players generating billions of database changes per day—was no easy feat.
On the day of the final production cutover, a promote-to-primary operation was executed on the CloudBasix console. This process involves performing actions such as database referential integrity validation and enabling constraints, foreign keys, triggers, and other objects against the replica database.
Target state architecture
After successfully migrating “Stumble Guys” to AWS, we leveraged several key scaling options provided by Amazon RDS for SQL Server to address previous limitations and support game’s massive player base. These capabilities allowed us to efficiently scale their database infrastructure to meet the demands of millions of daily active users generating billions of database changes. Let’s explore the primary scaling strategies we implemented:
- Compute scaling – We can now scale the compute resources of the RDS for SQL Server deployment, adding more CPU and memory as needed to handle the increasing workload. For more information, refer to DB instance class support for Microsoft SQL Server.
- Storage scaling – With io2 block express support, we can make the required storage modification meet our input/output operations per second (IOPS) and throughput requirement. Amazon RDS for SQL Server supports up to 64TiB and 256,000 IOPS with io2 Block Express volumes.
- Horizontal scaling – After migrating to RDS for SQL Server, we used the read replica capabilities to distribute read queries. The RDS for SQL Server read replicas allowed us to scale to meet the demands of our rapidly growing user base, delivering a high-performance, reliable experience. To learn more, refer to Working with read replicas for Microsoft SQL Server in Amazon RDS.
The following diagram shows the target state architecture.
By taking advantage of these comprehensive scaling capabilities in RDS for SQL Server, we can overcome the scaling limitations we had experienced with the Azure SQL Database.
Summary
CloudBasix’s solution met Scopely’s key requirements:
- High-volume cross-cloud replication from Azure SQL to Amazon RDS
- Asynchronous replication resilient to connectivity issues
- Automated initial data seeding to avoid backup and restore
- Flexible change data tracking mechanisms such as SQL Server change tracking
- Logical partitioning of replication jobs to optimize performance
- Secure deployment within Scopely’s Virtual Private Cloud (VPC).
The AWS team’s recommendation of CloudBasix, based on their expertise with similar large-scale migrations, proved invaluable. CloudBasix’s InterCloud solution enabled Scopely to successfully migrate the “Stumble Guys” mission-critical databases to Amazon RDS for SQL Server with virtually no disruption to the millions of active players worldwide. The combination of AWS expertise and CloudBasix’s robust cross-cloud data replication capabilities paved the way for a seamless cloud migration at massive scale. CloudBasix is available in the AWS Marketplace with a free trial option to explore and evaluate your requirement and use cases.
About the authors
Sudarshan Roy is a Senior Database Specialist Cloud Solution Architect with World Wide AWS Database Services Organization (WWSO). He has led large scale Database Migration & Modernization engagements for Enterprise Customers and his passionate of solving complex migration challenges while moving database workload to AWS Cloud.
Sudhir Amin is a Sr. Solutions Architect at Amazon Web Services. In his role based out of New York, he provides architectural guidance and technical assistance to enterprise customers across different industry verticals, accelerating their cloud adoption. He is a big fan of snooker, combat sports such as boxing and UFC, and loves traveling to countries with rich wildlife reserves where he gets to see world’s most majestic animals up close.
RJ Petroff is the Product Manager, Founder and CEO of CloudBasix. Although remaining deeply involved in product design and management, he created CloudBasix from the ground up, and evolved it into an enterprise cloud product company with international customers across some of the most demanding verticals. Prior to founding CloudBasix, RJ was involved in fintech product development and disaster recovery management at leading wall street firms.
Ian Monge is a Principal Engineer at Scopely, with over a decade of experience in the video game industry. Based in Barcelona, he has been involved in the full lifecycle of several games, from initial development through soft-launch and managing live games with millions of DAUs. Ian is a versatile engineer, having worked across a wide range of areas including feature development, cloud infrastructure, FinOps, and databases.