AWS Cloud Financial Management

Faster anomaly resolution with enhanced root cause analysis in AWS Cost Anomaly Detection

Today, AWS enhanced Cost Anomaly Detection with the ability to provide multiple root causes for cost anomalies. This new capability empowers FinOps professionals and cloud financial managers to quickly identify and resolve the underlying root causes driving unexpected cost increases. For FinOps teams striving to optimize cloud spend and maintain financial accountability, this enhancement offers deeper insights and faster resolution times. This post explains how this improvement can help you resolve anomalies more efficiently and answers some frequently asked questions.

Understanding Cost Anomaly Detection

AWS Cost Anomaly Detection helps you identify and resolve unexpected spikes in your AWS spending across your organization. It allows you to create monitors for AWS services, member accounts, cost allocation tags, and cost categories. The service uses machine learning to analyze historical data to calculate expected daily spend and compares it to actual spend. When actual spend exceeds the expected amount beyond a certain threshold, it identifies this as an anomaly and performs a root cause analysis.

Enhancing root cause analysis in Cost Anomaly Detection

The improved root cause analysis provides deeper insights into unexpected spending increases. While the previous version offered valuable information by highlighting the top two root causes, the new capability now identifies up to ten root causes for every anomaly above $1, offering a more detailed view of what different cost dimensions might be driving an anomaly. This feature analyzes every possible combination of the service, account, region, and usage type dimensions to pinpoint the most significant root causes to each anomaly, estimating specific dollar amounts for each. For example, for a $1,000 anomaly in AWS Lambda costs, you might see:

  • Lambda in the Production account (us-east-1) for Lambda-GB-Second usage: $195.32
  • Lambda in the Analytics account (us-west-2) for Lambda-GB-Second usage: $160.56
  • Lambda in the Development account (eu-west-1) for Lambda-Provisioned-Concurrency-GB-Second usage: $120.83
  • Additional root causes…

The sum of the estimated dollar attributions of these root causes may exceed or be less than the estimated value of the detected anomaly. This is because these estimated dollar attributions of root causes are approximations to help you identify areas for investigations.

Each root cause includes a direct link to AWS Cost Explorer, enabling you to conduct a more thorough investigation. This advancement empowers you to make informed decisions and take targeted actions to manage cloud costs effectively. Whether you are dealing with a few large cost drivers or multiple smaller root causes, you now have a clearer picture of what’s driving cost anomalies across your AWS environment, even for relatively small cost increases.

Best practices for using the root cause analysis

To maximize the benefits of the enhanced root cause analysis, we recommend following these best practices. These guidelines will help you efficiently navigate complex cost anomalies, prioritize your efforts, and take targeted actions to optimize your AWS spend.

  • Prioritize by Impact: Begin with the largest cost root cause.
  • Assess Planned vs. Unplanned Spend: Determine if increases were expected, or part of some planned work.
  • Investigate Unplanned Spend: Contact account/application owners or explore in Cost Explorer.
  • Look for Patterns: Identify recurring issues across multiple anomalies. For example, if your Development account frequently appears as a top contributor to cost anomalies, it might indicate a need for better cost controls in your development environment.

For more detailed information on implementing these practices, please refer to our documentation.

New, improved root cause analysis in practice: a real-world scenario

Imagine you are a FinOps analyst at a large e-commerce company. On a Monday morning, you receive an AWS Cost Anomaly Detection alert for an unexpected $5,000 increase in CloudWatch costs. In the alert you see that the service has identified 5 potential root causes, so you click on the link to the console to find out more. In the console you see the following:

Figure 1. Sample screenshot of enhanced root cause analysis

Figure 1. Sample screenshot of enhanced root cause analysis

You quickly investigate by contacting the relevant teams for the top three root causes, which account for over 90% of the anomaly. The Morpheus_Recommendations team confirms a planned feature release but with higher than expected log processing. The Atlas_Analytics team explains increased log storage due to new regulatory requirements. The Phoenix_CustomerPortal team identifies unexpected growth in vended log ingestion from a third-party service in the Mumbai region. This detailed breakdown allows you to quickly identify significant cost increases, reach out to the right teams, and gather context – all within an hour of receiving the alert. You can now provide valuable insights to stakeholders, highlight areas for potential optimization, and identify where budget adjustments may be necessary. This scenario demonstrates how the enhanced root cause analysis feature enables you to take swift, targeted action to manage costs across complex cloud environments, focusing on the most impactful areas first.

Accessing the enhanced root cause analysis programmatically

For customers who want to integrate Cost Anomaly Detection into their existing workflows or build custom applications, we have enhanced the getAnomalies AWS Cost Explorer API to support the new root cause capabilities. The API response now provides up to 10 root cause with their contribution to the identified anomaly. This new information helps you understand not just what caused the anomaly, but also how significant each cause was. Here’s a simplified example of what this might look like in the API response:

{
  "Anomalies": [
    {
      "AnomalyId": "a1b2c3d4-5678-90ef-ghij-klmnopqrstuv",
      "AnomalyStartDate": "2024-06-10",
      "AnomalyEndDate": "2024-06-10",
      "DimensionValue": "Amazon Simple Storage Service",
      "Impact": {
        "TotalImpact": 5000.00,
        "TotalImpactPercentage": 150.0
      },
      "RootCauses": [
        {
          "Service": "Amazon Simple Storage Service",
          "LinkedAccount": "1234***",
          "LinkedAccountName": "Production Account",
          "Region": "us-east-1",
          "UsageType": "TimedStorage-ByteHrs",
          "Impact": {
            "Contribution": 2250.00
          }
        },
        {
          "Service": "Amazon Simple Storage Service",
          "LinkedAccount": "2109****",
          "LinkedAccountName": "Analytics Account",
          "Region": "us-west-2",
          "UsageType": "TimedStorage-ByteHrs",
          "Impact": {
            "Contribution": 1250.00
          }
        },
        [...] 
      ],
    }
  ]
}

Getting started and next steps

Cost Anomaly Detection users will automatically benefit from the new enhanced root cause analysis without any additional action. The feature applies to all cost monitor types, alerts, the Billing and Cost Management Console, and APIs. It applies to all newly detected anomalies from the launch date, while historical anomalies retain their original root cause information. To access the detailed analysis, log into your AWS account and navigate to the Cost Anomaly Detection service under AWS Billing and Cost Management. For programmatic access, use the getAnomalies API. The expanded root cause information will be visible when viewing anomaly details.

This enhancement is available at no extra cost across all supported AWS regions. Implement it today to optimize your cloud cost management strategy. Consult the documentation to learn more about its capabilities and best practices. By integrating these insights into your existing tools and workflows, you can streamline your ability to detect, analyze, and reduce unplanned spend across your AWS environment.

Fredrik Tunvall

Fredrik Tunvall

Fredrik is a Senior Technical Product Manager in AWS Billing and Cost Management. He leads Cost Anomaly Detection and drives other key initiatives that help customers monitor, understand, and optimize their AWS cloud cost and usage.