AWS Database Blog
Set alarms on Performance Insights metrics using Amazon CloudWatch
Amazon RDS Performance Insights recently released a feature that sends key performance metrics from Performance Insights to Amazon CloudWatch. Using this feature, you can set alerts on these metrics.
When Performance Insights is enabled, it automatically sends the following three metrics to CloudWatch:
- DBLoad
- DBLoadCPU
- DBLoadNonCPU
I describe these three metrics following.
DBLoad
The first metric, DBLoad, is the core metric in Performance Insights. DBLoad is a measure of how many database connections are concurrently active. An active connection, also called a session in this post, is a connection that has submitted a query to the database for which the database hasn’t yet returned the results. During the period that a query is submitted and before the query results are returned, the query can be in one of several states. It is either running on the CPU, waiting for the CPU, waiting for a resource such as a lock or I/O to finish, or waiting for access to other database resources.
While the query is processing, it typically switches among these states. Ideally, the query spends all of its time running on the CPU actually processing the query instead of waiting. By seeing how many connections are concurrently active and what states they are in, we get a quick, powerful view into the load on the database. DBLoad is measured in average active sessions (AAS), which is the average number of concurrent database connections active. By default, each point in the DBLoad chart in the Performance Insights dashboard is the average time spent over 1 minute.
DBLoad has two parts:
- DBLoadCPU, which represents how much of the time the connections were running on CPU or ready to run on CPU
- DBLoadNonCPU, which represents how much of the time the connections were waiting for a database resource such as I/O, or a lock, or a database buffer
I focus on DBLoadCPU following.
DBLoadCPU
The clearest of the three metrics to set an alert on is DBLoadCPU. DBLoadCPU represents how many database connections are active and runnable on the CPU. If the host has enough CPU resources, all these connections are running on the CPU.
For every vCPU, there can be a connection running on that CPU. If there are more connections active and in a ready to run state than there are vCPUs on the server, then some of those connections are waiting for CPU resources. If the number of connections stays consistently higher than the number of vCPUs, then we know we have contention for CPU. In this case, we can benefit from either increasing the CPU capacity of the host (by scaling up the server) or tuning the code that accesses the database to be more efficient. Thus we can set an alarm for when DBLoadCPU exceeds vCPUs on the host to alert us when we have saturated our CPU resources on the host.
To set an alert on DBLoadCPU for an instance with Performance Insights enabled, navigate to CloudWatch console and choose Alarms.
On the alarm page, choose Create Alarm.
Then type DBLoadCPU.
From there, choose the DB instances to set the alarm on.
Then choose Next.
Now name the alert and set a threshold. The big question is what value to set the threshold.
The alert notifies us when there is more demand for CPU than CPU available. The CPU available is represented by the max vCPU line in the Performance Insights dashboard.
In my case, there are 8 vCPUs on the host. So if I set my alert to 9, I am alerted when CPU demand exceeds capacity.
Under Actions, I set up a notification to my email. Now I’m alerted when there is CPU contention. When I get an alarm, I can either look into optimizing the code to use less CPU or consider migrating to a larger instances size.
Summary
Now that you can send these key performance metrics from Performance Insights to CloudWatch, it is much easier to alert on high load conditions or bottlenecks on your Amazon RDS database servers.
About the Author
Kyle Hailey is a product manager for Performance Insights at Amazon Web Services.