Analyzing your custom metrics spend contributors in Amazon CloudWatch

With an ever-growing volume of custom metrics in Amazon CloudWatch, customers often find it difficult to understand and manage their spend on this service. One of the most common questions they have is how to identify which metrics contribute the most to their spend in CloudWatch. This blog post introduces a solution that lets you break down and analyze your custom metrics spend in CloudWatch. You will be able to pinpoint the top contributors, quickly estimate costs, and dive into the details you need to make informed decisions and optimize your CloudWatch usage.

We will start with a walkthrough of how to use the solution, and cover deployment in the second part of this blog post.

Exploring custom metric top contributors

Visualizing your metrics

Our solution leverages a custom dashboard with a treemap visualization that automatically sorts your data to identify top contributors at a glance.

When you load the dashboard, you see a map of your custom metrics namespaces. The area of each block is proportional to its contribution to the total number of custom metrics. This visual representation allows you to quickly identify the largest areas, which correspond to the biggest contributors to your custom metrics spend. With this insight, you can focus your optimization efforts where they’ll have the most significant impact on reducing CloudWatch costs.

Figure 1: Custom widget displaying the treemap of metrics namespaces

Interacting with the map

The dashboard is designed for intuitive interaction capabilities. Click on any block in the treemap to zoom in and explore the underlying metrics. From the initial namespace view, clicking on a namespace shows the list of metric names within that namespace. Unlike the traditional approach of browsing metrics through the “All metrics” menu, our dashboard starts from the metric name itself.

This approach instantly shows how many metrics exist with the same name across all levels of pre-aggregation as well as across all dimension values. It eliminates manual calculation. The number of distinct dimension values for a given metric name is called cardinality, and publishing too many distinct dimension values is a common cause of unwanted spikes in custom metrics spend.

Imagine a scenario where you measure the response time of your endpoints with the Amazon CloudWatch Synthetics service. This creates Duration metrics in the CloudWatchSynthetics namespace with various levels of granularity, such as “By Canary and Step,” “By Canary,” and “Across All Canaries.” If you had three canaries with two steps each, you would have 3 canaries x 2 steps = 6 metrics at the finest level of granularity, plus 3 metrics at the “By Canary” granularity level, and 1 metric across all canaries, which makes a total of 10 metrics. Our solution displays this total upfront, helping you identify at a glance metrics with the highest cardinality that may need attention.

To refine your view further, click on a specific metric name to see which dimensions contribute most to it. You can zoom out at any time by clicking the breadcrumbs path above the map.

Figure 2: Treemap displaying the totals for the SuccessPercent metric

Estimating costs

The dashboard includes a variable that allows you to toggle on or off an estimation of your custom metrics spend. While this estimation trades off accuracy for reactivity, it can provide a low-fidelity benchmark to identify groups of metrics that may require further investigation or optimization. This feature helps you make informed decisions about which areas of your custom metrics landscape deserve deeper analysis.

Here are the key limitations that may affect the estimate’s accuracy, most of which will overestimate your spend:

Point-in-time view: the dashboard provides an instant snapshot, and cannot differentiate between active and inactive metrics (a metric remains visible for 14 days if you stop publishing data to it). The estimate assumes that all metrics are active 24/7 for a full month, whereas only metrics incur costs only while you actively send them new data.
Single account: your usage may be consolidated with other accounts. The dashboard ignores your organization’s structure, and assumes your account is charged in isolation, without the full benefit of discounts that higher usage tiers grant you.
Regional pricing: the dashboard uses a hardcoded price based on the us-east-1 region price, which may differ from the price in your region.
Partial opt-in metrics support: the dashboard will ignore opt-in detailed monitoring metrics that get added to a namespace whose name starts with “AWS/”.

Figure 3: Treemap and breadcrumbs display the estimate USD spend when the Estimate cost toggle is on

Deploying the solution

This solution and associated resources are available for you to deploy into your own AWS account as an AWS CloudFormation template.

Prerequisites:

An AWS account
Permissions to deploy a CloudFormation stack

The CloudFormation template will deploy the following resources into the AWS account:

AWS Identity and Access Management (IAM) role for AWS Lambda
AWS Lambda function
Lambda Layer – containing a Python library used by the function
Amazon Simple Storage Service (Amazon S3) Bucket – used for the zip file of the Lambda layer
CloudWatch Dashboard

Steps to deploy the stack:

Download the YAML file.
In CloudFormation, create a stack using the downloaded file, for detailed instructions please refer to the CloudFormation documentation or to the downloaded README file.
choose Dashboards from the left side panel, you should now see the newly created dashboard (by default “custom-metrics-cost-examiner-dashboard”)
The first time you open the dashboard, you must click “invoke Lambda function” to grant the dashboard permission to invoke the function on your behalf

Cleanup

If you decide that you no longer want to keep the solution and associated resources, you can navigate to CloudFormation in the AWS Console, choose the stack (by default “custom-metrics-cost-examiner-dashboard” or the name used when you deployed it), and choose Delete. All the resources will be deleted.

Conclusion

In this blog post, we introduced a solution that allows you to easily analyze and optimize your spend on Amazon CloudWatch custom metrics. By leveraging a custom CloudWatch dashboard with an intuitive treemap visualization, you can quickly identify the top contributors to your custom metrics costs and dive into the details.

The dashboard provides several key capabilities:

Visualize your custom metrics namespaces and identify the largest cost contributors at a glance
Seamlessly zoom in to explore the underlying metric names and dimension values, understanding the cardinality of your metrics
Estimate your overall custom metrics spend to get a high-level sense of optimization opportunities
Customize the color scheme for improved accessibility and readability

We also walked through the steps to deploy this solution in your own AWS account using CloudFormation. With this tool at your disposal, you can take control of your CloudWatch custom metrics spend, make informed optimization decisions, and ensure you’re only paying for the metrics that provide the most value to your business.

By proactively managing your custom metrics landscape, you can avoid unexpected cost spikes and keep your CloudWatch costs aligned with your operational needs.

To learn more about CloudWatch and Observability visit the AWS Observability Workshops page

About the authors:

AWS Cloud Operations Blog