Customer Stories / Software & Internet
Rapid7 Reduced Operational Latency by 40% by adopting Amazon DynamoDB
Learn how Rapid7 reduced engineering overhead and enabled a fast, reliable infrastructure for real time data processing pipeline by migrating from self-managed Cassandra to Amazon DynamoDB.
40%
reduction in application latency
60
engineering hours per week saved from managing operational burden
up to 99.999%
availability attained for their customers
Near zero
downtime migrating to DynamoDB
Overview
Rapid7 is a leading provider of security data solutions that enables organizations to implement a dynamic, analytics-driven approach to cybersecurity. Their software-as-a-service tool, called InsightIDR, provides visibility into an organization’s vulnerabilities allowing them to rapidly detect compromises while avoiding false alerts, respond to breaches, and correct the underlying causes of attacks. InsightIDR collects data from customer’s existing network security tools, authentication logs, and endpoint devices, and combines the full power of endpoint forensics, log search, and sophisticated dashboards into a single solution. Increased customer demand for InsightIDR resulted in Rapid7 needing to scale up their self-managed Cassandra clusters. Rapid7’s technical team was seeking a managed solution to be instantly scalable and highly performant.
Opportunity | Finding a Highly Performant and Scalable Managed Solution for a Growing Application
Built on the self-managed Apache Cassandra in 2013, Rapid7’s InsightIDR infrastructure was comprised of 6 clusters. The biggest cluster “Process Registry” had grown to over 280 nodes over a period of 9 years. Their critical application was running on a Cassandra cluster with 80 nodes and was due for a database upgrade. The cluster was hosting multiple-tenants and also co-hosted multiple applications. Any need for patching, version upgrade, or maintenance amplified the impact when it came to managing the cluster. Guru Bandari, Rapid7’s Senior Software Engineer said, “Scaling the Cassandra clusters and performing maintenance tasks such as patching had become an operational nightmare especially with the growing customer base. Scheduling downtime while increasing tenants had become even more challenging than before.”
Rapid7’s use case also included the need to delete data after 60 days of its creation on the InsightDR application. Previously, they relied on Cassandra's TTL (Time to Live) feature to remove the data, but this approach consumed resources and adversely affected performance.
As the popularity of InsightIDR product increased, Rapid7 was looking for a maintenance free, fully managed, and highly available database solution.
Scaling our clusters might have taken 60 minutes in the past to get servers configured and deployed. Now, we can provision in a matter of minutes, enabling our cloud and data engineers to focus on product innovation."
Andrew Keely
Senior Manager, Software Engineering, Rapid7
Solution | Seamless Migration to DynamoDB with Near Zero Downtime
In June 2022, Rapid7 modernized their Cassandra workload by migrating to Amazon DynamoDB. Rapid7 technical team saw value in DynamoDB, primarily because it is a fully managed database service with up to 99.999% availability. They worked on redesigning their application to adopt DynamoDB which included endurance testing with peak workloads. They benefited from DynamoDB’s seamless scaling and low single-digit milliseconds latency for their read and write workloads.
For a critical and multi-tenant application such as InsightDR, Rapid7 adopted the safest migration approach of enabling dual writes at the unpredictable application layer, using custom scripts. To address this challenge, they utilized DynamoDB's On-Demand mode, enabling them to effectively manage unexpected surges in traffic and analyze traffic patterns. Rapid7 continued to run reports and test their read-workloads on dual environments until they confirmed data consistency.
They also used Amazon CloudWatch Contributor Insights, which was crucial for identifying hot partition issues, which was identified as bottle-neck during endurance testing. After running their application workload in parallel for two months across both environments, they gained the confidence to modernize their entire workload with DynamoDB which helped them achieve the near zero downtime migration.
Rapid7 was also drawn to Amazon DynamoDB due to its integrated security features and native support for backup and restore. Additionally, the point-in-time recovery (PITR) feature of Amazon DynamoDB proved highly beneficial as it allows Rapid7’s customers to effortlessly and continuously back up their table data with precise per-second granularity. This feature enabled Rapid7 with the ability to restore their data to any specific moment within the past 35 days.
DynamoDB's TTL feature provided automated deletes to reduce stored data volumes without any additional cost or performance impact. Andrew Keely, Senior Manager Software Engineering shared, “Rapid7 experienced improvements in their annual infrastructure management costs and reduced an average of 60 weekly hours by modernizing with Amazon DynamoDB. Moreover, this migration enabled them to achieve consistent performance with single-digit milliseconds latency and ensured operational reliability with 99.999% availability for their customers.”
Outcome | Attaining Swift Scaling, Notable Savings, and Elevated Performance Simultaneously
In just 8 weeks, Rapid7 migrated to Amazon DynamoDB, a fully managed, serverless, key-value database designed to run high performance applications at any scale. With the adoption of DynamoDB as their new platform, Rapid7 reduced their time to scale up their instances from weeks to minutes, to meet their increased workload. Not only did they save engineering hours, they also increased data processing speeds by 50% for their write workloads and 33% for read workload, while observing consistent performance.
Keely mentioned, “Throughput in Cassandra was not well-defined, and an overloaded cluster slowed down requests rather than throttling, which was detrimental and not acceptable for us. DynamoDB however, has very consistent performance when operating within configured throughput.”
Amazon DynamoDB service was a perfect fit for their needs, given that it is a fully managed, serverless database offering by AWS. As Rapid7’s customer base continues to grow, they plan to use the DynamoDB caching service, DynamoDB Accelerator (DAX) to further improve the read workload latencies. Bandari explained “We are close to sunsetting our remaining Cassandra footprint as we finish our modernization project.”
About Rapid7
Rapid7 is a cybersecurity company based in Boston, USA, founded in 2000. It specializes in helping organizations manage cybersecurity risks, detect and respond to threats, and improve overall security. They offer tools for vulnerability management, incident detection and response, application security, security analytics, and cloud security.
AWS Services Used
Amazon DynamoDB
Amazon DynamoDB is a fully managed, serverless, key-value NoSQL database designed to run high-performance applications at any scale. DynamoDB offers built-in security, continuous backups, automated multi-Region replication, in-memory caching, and data import and export tools..
More Software & Internet Customer Stories
Get Started
Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.