How Delhivery saved 15% of cloud infrastructure cost in 50 days

Delhivery is India’s largest full-integrated logistics service provider. It works with AWS to enhance efficiency, availability, and scalability of its systems.

Establishing a good cost allocation strategy

In its journey to optimize AWS infrastructure cost, the most important step was to retrieve the cost incurred by each resource. As Shashank Kumar, Sr. Director DevOps, says, “This will solve half of the problems, as many surprise costs incurred can be identified quickly.”

Delhivery started with accessing and enabling comprehensive data visualization and analysis with AWS Cost and Usage Reports (AWS CUR) and AWS QuickSight. With this data, Delhivery was able to customize visualization using multiple criteria per service, account, resources, teams, etc. The image shows account-based costs. Notice the green (up) and red (down) color markings, which represent the previous day cost trends and deviations of +/- 5%.

Implementing cloud cost governance using tags

One of the most important aspects of cloud governance is to identify the resource owners so that they can work on their respective projects for cost optimization using the above techniques. Tagging creates a consistent and programmatic approach to an organization’s activities based on operational practices, defined use cases, stakeholders involved in the process, etc. Delhivery established and implemented a tagging policy to help relate AWS usage and consumption to resource owners, which increased visibility and ownership of costs and usage against the overall organizational strategy and value.

From there, cost optimization became a daily, automated activity, which enabled and expanded awareness and accountability of costs incurred by specific usage type and operation. Delhivery categorized resources cost either per service or the top daily/monthly contributors, and shared with teams for review and cost optimization. It also set up AWS Cost Anomaly Detection, to identify and quickly notify teams of spend anomalies.

Using the right AWS pricing strategy

After deep diving in to service usage and resource utilization, Delhivery identified and purchased specific Savings Plans and Reserved Instances to further cost optimize.

Savings Plans

Delhivery checked for hourly on-demand instance/Lambda/Fargate usage and bought a 3-year term Saving Plan, which provided a 45-50% discount on on-demand usage.

Reserved Instances

After confirming with the respective product owners that workloads would be running for at least a year, Delhivery purchased Reserved Instances for RDS, resulting in an immediate 30% average savings, and for OpenSearch and ElastiCache, delivering a 30-32% average savings.

Identify idle resources and optimization opportunities

Savings 50% on EC2 and Lambda through data-driven cost and usage analysis

For example, it moved all the EC2/Lambda workloads which could be scheduled at any time of the day, like batch jobs, to be initiated and completed within an off-hours window to make use of existing Saving Plans and avoid on-demand usage, and implemented mechanisms to shut down instances over weekend and off-hours in their non-production environment.

By further deep diving into data analysis, Dehlivery identifed and executed optimization opportunities to save more than 50% on costs related to AWS Lambda and Amazon EC2.

AWS Lambda

Using AWS Trusted Advisor, Dehlivery found the top AWS Lambda functions with overprovisioned memory and high error rates. They finetuned the number of retries and timeouts, which helped to reduce the Lambda functions cost.

Amazon EC2

Again, with Trusted Advisor, Delhivery identified resource optimization opportunities to terminate or downscale underutilized Amazon EC2 Instances. It reviewed all of its workload requirements, assigning each to compute, general, or memory intensive instances, accordingly. All Spot Instances workloads were assessed to identify and act on unwanted running resources or instance optimization opportunities.

Delhivery also implemented CloudWatch agent on all EC2 instances to extract memory metrics, helping the DevOps team find underutilized EC2 instances through Compute Optimizer, using both CPU and memory metrics.

AWS’s Cost Optimization Workshop (COW), offered by AWS Enterprise Support, helped Dehlivery receive consolidated reports for underutilized EC2 instances, aged EBS and RDS snapshot, ELB optimization, unused ELBs, etc. This improved engineers’ overall productivity, enabling them to focus on solving business problems instead of writing custom scripts to extract different reports.

AWS Graviton Processor

The release of AWS Graviton, empowers Delhivery to dramatically reduce costs further without any trade-offs on performance for its compute workloads. All partial or full on-demand EC2 based workloads were migrated to the Graviton-based instances. Team, also created Graviton-based nodes in Amazon OpenSearch and in new RDS instances, to get the published price performance. Delhivery was able to reduce the compute cost after the adoption.

Modernize your data infrastructure with fully managed, purpose-built databases

Free your teams from time-consuming database tasks like server provisioning, patching, and backups. AWS fully managed database services provide continuous monitoring, self-healing storage, and automated scaling to help you focus on application development.

Amazon RDS

Amazon RDS cost is a significant portion of Delhivery’s overall AWS spend. As part of its cost optimization exercise, Delhivery implemented multiple mechanisms to bring down its RDS cost. It used Trusted Advisor to find idle RDS instances and deleted them from different accounts. Additionally, all non-production and non-critical RDS workloads were identified and moved to a Single-AZ setup. This exercise alone saved approximately 8% of total RDS cost.

Amazon DynamoDB

Usage of No-SQL databases like Amazon DynamoDB has increased over the years in Delhivery. DynamoDB is used in multiple applications including some of the critical ones. Being a serverless offering made for scale, not following best practices could end in bill shock. So, Delhivery began reviewing the capacity mode in DynamoDB and decided to remove the provisioned capacity from all non-production workloads. They also removed all unused tables which were configured with provisioned capacity. For production environments, the team configured provisioned read/write with autoscaling enabled for predictable workloads.

After a thorough review of the architecture and use cases, DynamoDB Accelerator usage was limited to production workloads, and mandated to set the TTL on DynamoDB tables to keep the cost down. With these optimizations, Delhivery was able to save 15% on its Dynamo DB cost.

Business agility and governance control on AWS

In the past, organizations have had to choose between innovating faster and maintaining control over cost, compliance, and security. With AWS Management and Governance services, customers can use our services to assess their resource utilization and identify ways to reduce costs.

Amazon CloudWatch

Amazon CloudWatch is integrated with all AWS services used within Delhivery. To cost optimize, it reviewed the log retention policies – Delhivery set the retention policy in AWS Organizations to be between two to four weeks. Exceptions were made for the metrics that needed a longer retention.

The team used a CUR query and found that Lambda logs were contributing approximately 60% of total CloudWatch costs. As a result, Dehlivery mandated that development teams remove unnecessary logs. CloudWatch metric storage cost can pile up over time and can increase by storing large amounts of redundant or outdated metrics. Delhivery reviewed its Amazon Managed Service Kafka (MSK) cluster, and removed unused topics to reduce CloudWatch costs for Kafka metric storage by approximately 50%.

These optimizations led to the reduction of overall CloudWatch cost by approximately 40%.

Fastest way to get answers from all your data to all your users

AWS analytics services are purpose-built to help you quickly extract data insights using the most appropriate tool for the job, and are optimized to give you the best performance, scale, and cost for your needs.

Amazon OpenSearch Service

Amazon OpenSearch Service adoption in Delhivery has increased over the years. Recognizing that operating cost of OpenSearch is multi-dimensional, depending on compute, memory, and storage, Delhivery optimized its master and data nodes strategies, and used the flexibility provided by Amazon OpenSearch Service to reduce storage by decreasing overprovisioned clusters. Ultimately, Delhivery was able to reduce its OpenSearch cost by 5%. It also ran an organization-wide campaign to identify applications that required consistent capacity. By selecting Reserved Instances, Delhivery was able to save on an average 30% on the OpenSearch node cost.

Run every workload on a secure and reliable global network

Get the broadest and deepest set of networking and content delivery services in the world with AWS. Run applications with the highest level of reliability, security, and performance in the cloud.

Elastic Load Balancer

With Elastic Load Balance (ELB), Delhivery created a custom script to identify all of the idle Elastic Load Balancers. The results were surprising as they identified more than 100 ELBs, costing them an average of $20 per month per ELB. Delhivery team consolidated their load balancers and advised their application teams to use 1 load balancer per application, or a use Elastic Kubernetes Service cluster using host/path-based routing.

Amazon Virtual Private Cloud (Amazon VPC)

Delhivery worked with AWS Solution Architect to review the networking architecture. Review findings suggested to use NAT Gateway in each availability zone to reduce the Inter-AZ data transfer cost. Additionally, use of Amazon Virtual Private Cloud (Amazon VPC) endpoints for S3 & DynamoDB was suggested and later implemented to further reduce the overall cost.

Conclusion

Cloud Cost Optimization is an art which can only be implemented successfully by those teams who know the science of identifying the anomalous charges incurred for a particular resource. It's a continuous activity that can only deliver positive results by having and understanding how you can act upon your data with a cadence (daily, weekly, monthly, etc.) that aligns with your organization’s needs.

By establish cost allocation strategy, cutting idle resources, pulling resource and cost optimization levers, and creating a strong governance mechanism, Delhivery was able to achieve more than 15% daily savings within 50 days of teamwork, and identified an additional 15-25% more in additional savings within the next couple of quarters.

About Delhivery

Delhivery is an e-commerce logistics service provider, and one of the largest and fastest growing companies, by revenue, in India. Its mission is to help customers operate flexible, reliable, and resilient supply chains at the lowest cost. Delhivery aims to build the operating system for commerce, through a combination of world-class infrastructure, the highest quality logistics operations, and cutting-edge engineering and technology capabilities.

Benefits of AWS

  • Achieved 15% cost savings and identified additional 15-25% savings through data analysis and resource optimization
  • Built automated cost monitoring and tracking capabilities, enhancing a culture of transparency and accountability
  • Established good governance and best practices across Delhivery organization