Freecharge lowered their identity management system cost and improved scaling by switching to Amazon DynamoDB

Freecharge, subsidiary of Axis Bank, is a payment app serving over 100 million users across India. Over the years, Freecharge has transformed to become one of the leading financial services and investment apps in the country. Freecharge has always been known for offering safe and seamless UPI payments, utility bill payments, mobile/DTH recharges, and much more.

Freecharge regularly encounters traffic spikes resulting from marketing campaigns, promotions offers, and live events. Choosing a database that can efficiently scale to support their unpredictable user growth is important to avoid costly resource overprovisioning.

In this post, we explain how Freecharge migrated their Identity Management System workload to Amazon DynamoDB, a fully managed, serverless, NoSQL database designed to run high-performance applications at any scale. DynamoDB helped effectively manage traffic spikes while lowering total cost of ownership (TCO).

Freecharge’s identity management system

Freecharge’s identity management system (IMS) is responsible for handling tokens for every incoming request across their payment service. This system also serves as a repository for all user data registered on Freecharge, functioning as both a token and user management system. IMS uses Aerospike as a persistent database for token management as well as a caching layer for user management.

At the time of writing this post, Freecharge’s IMS platform processes approximately 1 million requests per day. Authenticating users across Freecharge is a mission-critical function. Ensuring the availability and resilience of IMS is pivotal for the overall success of the business.

The Freecharge IMS architecture follows an asynchronous messaging approach using independent microservices to operate at scale and evolve independently and flexibly. The following diagram illustrates the legacy architecture.

The Freecharge IMS used self-managed Aerospike, an NoSQL database that requires a subscription or license to access its advanced capabilities. Aerospike has the following architecture components that needs to be managed:

Node – Aerospike is designed as a distributed database, and each server instance running Aerospike is referred to as a node. Nodes collectively form a cluster, working together to manage and store data.
Namespace – Aerospike organizes data into namespaces, which act as logical containers for sets. Namespaces help separate different types of data or services within the same cluster.
Set – Within a namespace, data is further organized into sets. Sets are analogous to tables in relational databases. Sets are a way to group related records together, making it easier to manage and query data.
Storage engine – Aerospike uses a flash-optimized, memory-centric storage engine for efficient data storage and retrieval. The storage engine plays a crucial role in ensuring high-speed read and write operations
Index – Aerospike maintains indexes for fast data access. Indexes are used to locate records based on specified criteria. The indexing mechanism contributes to the speed and efficiency of queries.
Replication – Aerospike provides replication to ensure data availability and fault tolerance. In a replicated mode, copies of data are stored on multiple nodes, allowing for recovery in case of node failures.
Sharding – Aerospike uses a shared-nothing architecture with data distribution across nodes through sharding. Sharding enables horizontal scaling, so the database can handle increased loads by adding more nodes.
Cross-data center replication – For multi-data center deployments, Aerospike provides cross-data center replication (XDR), allowing data to be replicated across different geographical locations.

Challenges with the original design

Freecharge initially used self-managed Aerospike to manage their user session workloads. Aerospike was chosen because its architecture is capable of scaling to large volumes of data with low latency read and write operations.

Over the years, the number of microservices and offerings by Freecharge grew. We faced several challenges while scaling and maintaining our Aerospike database:

Scaling – Aerospike licensing is based on storage usage for each cluster. Freecharge, with two clusters consisting of six nodes each, hosted on Amazon Elastic Compute Cloud (Amazon EC2), faces limitations in scaling storage without incurring additional costs. This constraint diminishes the scalability of the original design.
Operational ease – Operating the self-managed Aerospike enterprise version placed a substantial burden on Freecharge’s team. They undertook responsibilities, including patching, upgrades, monitoring of underlying instances, license management, storage monitoring, and backup management. These tasks collectively contributed to an elevated TCO.
Cost – The overall cost of using Aerospike proved significantly higher due to the necessity of paying for licenses, coupled with expenses associated with hosting on EC2 servers. This dual expenditure model contributed to a substantial financial burden.

The switch to DynamoDB

Freecharge evaluated alternative database solutions and determined that DynamoDB was the best match for our performance, scale, and durability requirements. Further, Freecharge had familiarity with using DynamoDB in multiple business-critical services. We found that DynamoDB helped our engineering teams become more efficient due to:

Consistent performance at scale – DynamoDB provides consistent performance as your application scales. Regardless of the database size or number of concurrent queries, read and write operations maintain reliable response times in the order of milliseconds.
Simplified data modeling – DynamoDB encourages keeping related data together, eliminating the need for a query planner to parse complex queries. This results in low-latency lookups and efficient queries as the scale grows. Developers benefit from not having to spend time on performance tuning, query plan debugging, or addressing performance issues at scale.
Auto scaling capacity – DynamoDB offers an auto scaling capacity feature, dynamically adjusting database resources (provisioned throughput capacity) based on user traffic. This enables seamless scalability for sudden spikes in traffic without over-provisioning for peak workloads, all while maintaining consistent latencies in a cost-effective manner.
Serverless management – DynamoDB eliminates the need for manual setup and management tasks. It operates in a serverless model, handling tasks such as hardware provisioning, configuration, replication, software patching, database backup, and cluster scaling automatically. This reduces operational overhead, maintaining efficiency.

Solution overview

This section highlights the key points that guided our approach to migrating to DynamoDB.

Single-table design

In the original database architecture, we had separate tables each for storing data for a particular entity. There were seven separate tables, or sets, for maintaining user data.

Given our learnings of using DynamoDB for multiple microservices, we decided to streamline this schema into a unified DynamoDB table. This effectively consolidated related data into a single table (see the following figure), enhancing performance.

This data modeling strategy also reduced costs, especially for read operations, because a single read operation now retrieves all the required data instead of multiple queries for different entities.

Migration approach

Our approach to transition from Aerospike to DynamoDB aimed to provide reduced downtime and uninterrupted operations. The application layer was modified to act as a switch to replicate traffic to one or more databases. This allows support for a dual-write database, where traffic can be diverted in either direction. The following diagram illustrates the details of each phase:

Employing a phased strategy, in Phase 1, we implemented dual-writes to both the source database (Aerospike) and the target database (DynamoDB), and only read from the source. In Phase 2, we began writing to the target database, and read from both the source and target databases. In Phase 3, while new data was being written to the target database, older data was copied and backfilled by background processes. In this stage, the application layer activates data validation. The records that are read from both the data layers are compared for accuracy and consistency. This allows any discrepancies in the data to be fixed. In Phase 4, we initiated the cutover with zero downtime from the source, transitioning to writing and reading exclusively from the target. This was the “point of no return,” when new data was written to the target database alone. At this point, Aerospike was taken offline for eventual retirement.

We had concerns related to potential customer logouts as tokens were stored in Aerospike, but with the successful migration of records to DynamoDB, we delivered a positive customer experience. This metric provides continuous data availability throughout migration, maintaining uninterrupted access to critical information.

Migration results

In this section, we discuss the operational ease, increased scalability, and cost-optimization realized with this solution.

Operational ease

With Aerospike being a self-managed database, there were a lot of manual tasks that we needed to perform, as explained earlier. DynamoDB stands out in operational ease compared to self-hosted Aerospike. DynamoDB offers a fully managed service, automating tasks like scaling, backups, and security, with single-digit millisecond response times. DynamoDB simplifies database management, providing a hassle-free experience and allowing users to focus on application development rather than infrastructure intricacies.

Scaling

DynamoDB offers seamless and efficient scaling, providing a flexible solution to match varying workloads. Its serverless architecture eliminates the need for manual intervention, allowing users to scale up or down effortlessly based on demand. Its auto scaling feature adjusts capacity to accommodate changes in traffic, delivering optimal performance and resource utilization. With the ability to handle both read and write throughput independently, DynamoDB offers granular control over scaling operations.

In our previous solution, we required additional effort in scaling operations where DynamoDB handles automatic scaling, reducing the need for manual intervention. The following read transactions per second (TPS) graph shows the fluctuation, which can scale with ease and on demand.

The following graph shows the write TPS.

We wanted to achieve both read and write TPS scalability for this particular application without having any fixed operational expenditure and manual intervention, which wasn’t possible in the previous solution.

Cost

In the new solution with DynamoDB, we pay only for what we use, whereas Aerospike required us to over-provision the database for sudden traffic peaks. By moving our billing system from Aerospike to DynamoDB, we lowered our costs by 60%

Conclusion

In this post, we discussed how Freecharge reformed the IMS application’s session management layer by leveraging Amazon DynamoDB, a fully-managed, serverless, NoSQL database. Freecharge seamlessly migrated from Aerospike, to eliminate the operational blockers arising from their self-managed, license-based model.
DynamoDB’s pay-as-you-go pricing effected a 60% lower cost of ownership for this workload, and its schema flexibility allowed for a unified table design, further optimizing the queries and hence, the performance. Overall, an ideal bargain for Freecharge.

To learn more about how to maximize performance and minimize throughput costs while using DynamoDB, see Best practices guide for designing and architecting with Amazon DynamoDB.

About the authors

Vikalp Singh is working as Senior Director of Infrastructure at Freecharge, a digital payments company wholly owned by Axis Bank. He is responsible for any new implementation, designing and developing robust and scalable solutions that can handle massive amounts of data efficiently.

Kapil Saneja is currently working as Associate Director at Freecharge. He is responsible for implementing automation throughout the software development lifecycle in Freecharge. He’s passionate about leveraging technology and automation to streamline infrastructure processes, address deployment challenges, and drive architectural innovations.

Shubham Sharan is a Technical Account Manager (TAM) at AWS in Delhi, India. He is specialized in cloud architecture, infra-automation ,and system engineering with a focus on cloud-native architecture. He collaborates with major FSIs, GSIs, and Media customers, offering expertise in designing migration and modernization strategies, along with providing consultative architectural and operational guidance.

Dharam Thakkar is an Enterprise Solutions Architect at AWS in Mumbai, India, specializing in serving BFSI clients to navigate their digital transformation journey. With expertise in containers, Dharam crafts innovative solutions to optimize operations and drive business growth in the dynamic landscape of cloud computing