AWS DevOps & Developer Productivity Blog
Using Amazon CloudWatch and Amazon SNS to Notify when AWS X-Ray Detects Elevated Levels of Latency, Errors, and Faults in Your Application
AWS X-Ray helps developers analyze and debug production applications built using microservices or serverless architectures and quantify customer impact. With X-Ray, you can understand how your application and its underlying services are performing and identify and troubleshoot the root cause of performance issues and errors. You can use these insights to identify issues and opportunities for optimization.
In this blog post, I will show you how you can use Amazon CloudWatch and Amazon SNS to get notified when X-Ray detects high latency, errors, and faults in your application. Specifically, I will show you how to use this sample app to get notified through an email or SMS message when your end users observe high latencies or server-side errors when they use your application. You can customize the alarms and events by updating the sample app code.
Sample App Overview
The sample app uses the X-Ray GetServiceGraph API to get the following information:
- Aggregated response time.
- Requests that failed with 4xx status code (errors).
- 429 status code (throttle).
- 5xx status code (faults).
Getting started
The sample app uses AWS CloudFormation to deploy the required resources.
To install the sample app:
- Run git clone to get the sample app.
- Update the JSON file in the Setup folder with threshold limits and notification details.
- Run the install.py script to install the sample app.
For more information about the installation steps, see the readme file on GitHub.
You can update the app configuration to include your phone number or email to get notified when your application in X-Ray breaches the latency, error, and fault limits you set in the configuration. If you prefer to not provide your phone number and email, then you can use the CloudWatch alarm deployed by the sample app to monitor your application in X-Ray.
The sample app deploys resources with the sample app namespace you provided during setup. This enables you to have multiple sample apps in the same region.
CloudWatch rules
The sample app uses two CloudWatch rules:
- SCHEDULEDLAMBDAFOR-sample_app_name to trigger at regular intervals the AWS Lambda function that queries the GetServiceGraph API.
- XRAYALERTSFOR-sample_app_name to look for published CloudWatch events that match the pattern defined in this rule.
CloudWatch alarms
If you did not provide your phone number or email in the JSON file, the sample app uses a CloudWatch alarm named XRayCloudWatchAlarm-sample_app_name in combination with the CloudWatch event that you can use for monitoring.
Amazon SNS messages
The sample app creates two SNS topics:
- sample_app_name-cloudwatcheventsnstopic to send out an SMS message when the CloudWatch event matches a pattern published from the Lambda function.
- sample_app_name-cloudwatchalarmsnstopic to send out an email message when the CloudWatch alarm goes into an ALARM state.
Getting notifications
The CloudWatch event looks for the following matching pattern:
{
"detail-type": [
"XCW Notification for Alerts"
],
"source": [
"<sample_app_name>-xcw.alerts"
]
}
The event then invokes an SNS topic that sends out an SMS message.
The CloudWatch alarm looks for the TriggeredRules metric that is published whenever the CloudWatch event matches the event pattern. It goes into the ALARM state whenever TriggeredRules > 0 for the specified evaluation period and invokes an SNS topic that sends an email message.
Stopping notifications
If you provided your phone number or email address, but would like to stop getting notified, change the SUBSCRIBE_TO_EMAIL_SMS environment variable in the Lambda function to No. Then, go to the Amazon SNS console and delete the subscriptions. You can still monitor your application for elevated levels of latency, errors, and faults by using the CloudWatch console.
Uninstalling the sample app
To uninstall the sample app, run the uninstall.py script in the Setup folder.
Extending the sample app
The sample app notifes you when when X-Ray detects high latency, errors, and faults in your application. You can extend it to provide more value for your use cases (for example, to perform an action on a resource when the state of a CloudWatch alarm changes).
To summarize, after this set up you will be able to get notified through Amazon SNS when X-Ray detects high latency, errors and faults in your application.
I hope you found this information about setting up alarms and alerts for your application in AWS X-Ray helpful. Feel free to leave questions or other feedback in the comments. Feel free to learn more about AWS X-Ray, Amazon SNS and Amazon CloudWatch
About the Author
Bharath Kumar is a Sr.Product Manager with AWS X-Ray. He has developed and launched mobile games, web applications on microservices and serverless architecture.