AWS Cloud Operations Blog
Using the unified CloudWatch Agent to send traces to AWS X-Ray
Today, applications are more distributed than ever before and they no longer run in isolation. This is especially the case when utilizing Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). A distributed workload or system is one that encompasses multiple small independent components, all working together to complete a task or job. This helps to ensure that even if one system or component fails, that the availability of the service will not be affected. Distributed Tracing is a method of observing client-side requests as they propagate through these various components which helps to isolate latency, faults and other system errors.
End-to-end distributed tracing platforms begin collecting data the moment that a request is initiated, such as when a user submits a form on a website including any other upstream calls that resulted from the initial client request. As the need for distributed systems grows, so does the need for tracing at every level. By implementing this, we get the benefits of understanding service relationships, the ability to measure specific user actions, maintain SLAs and much more!
In this post, we will explore the basics of a trace, what AWS X-Ray is, we will instrument a sample application to generate traces and use the Amazon CloudWatch Agent to act as our collector and exporter.
What is a trace?
A trace represents the entire journey of a request as it travels through the various components of a service or system. They are an essential pillar of observability, as they provide us with the finer details regarding the flow of the request as it enters and leaves our system.
Unlike logs or metrics, traces are composed of events from more than one system component or service. They provide context about the connection between services such as response latency, service faults, request parameters and metadata (which provides further context into the data that is being collected).
AWS X-Ray is a service that collects data in the form of these traces. It receives data in what is known as segments which contains details about the work or task carried out by the component or service and a trace can be made up of multiple segments. Segments can break down the work carried out into subsegments which provide more granular timings and details regarding any downstream calls that were made to fulfill the original request i.e. External API call, SQL Queries etc. X-Ray groups these segments that have a common request into traces.
Note: While OpenTelemetry (OTel) uses the concept of spans, AWS X-Ray uses the concept of Segments – The two terms can be used interchangeably when we discuss tracing.
AWS X-Ray provides the ability to view, filter and gain insights into these requests. Instrumenting an application involves sending trace data for incoming and outbound requests and other events within an application, along with metadata about each request. There are several ways to instrument an application to send traces:
- Auto Instrumentation – instrumenting your application with zero code changes, this is usually done via config changes or using an auto instrumentation agent
- Library Instrumentation – make minimal application code changes to add pre-built instrumentation targeting specific libraries or frameworks, such as the AWS SDK, Apache HTTP clients, or SQL clients
- Manual instrumentation – add instrumentation code to your application at each location where you want to send trace information e.g. on each try/catch
CloudWatch Agent and tracing
When trace data is generated, collectors are used to both collect, process and export this data. Collectors are typically made up of three components:
- Receivers – This component receives the data either via a Push or Pull Model
- Processors – Used to perform performing data aggregation, filtering, sampling and other collector processing logic
- Exporters – Used to define the intended destination(s) for the data e.g. AWS X-Ray
Such collectors provide the ability to collect application traces from instrumented services whereby code from the systems components has been modified to emit traces either using X-Ray SDK or OTel language-specific SDKs i.e. (Python, Node.js, Java, Ruby, .NET, and more) that let developers use the OpenTelemetry API to generate telemetry data in the language of their choice.
Note: Collectors are not only used for collecting traces, but are also utilized for metrics and logs in respect to the collection, processing and exporting to various backend destinations.
The Amazon CloudWatch Agent has now added support for the collection of AWS X-Ray and OpenTelemetry traces. Previously, in order to collect trace data, AWS customers were required to utilize the X-Ray Daemon, however, customers now only need to provision a single agent to capture metrics, logs, and traces.
Overview of solution
The below architecture diagram provides the flow that is achieved when utilizing the CloudWatch Agent to act as the collector. Although X-Ray specific APIs and SDKs can be used to emit traces, for the purpose of this blog post, the focus will be on utilizing OTeL.
Figure 1 : Request flow
Walkthrough
To start emitting traces to X-Ray, carry out the following:
- Create an IAM Role to allow the Agent to send trace data to X-Ray
- Install, configure and start the unified CloudWatch Agent
- Install and utilize a python application to emit traces via OTLP
Prerequisites
To start, the following prerequisites are required:
- An AWS account
- An EC2 Instance running Amazon Linux 2023 with access to the Internet via a NAT Gateway or an Internet Gateway
- For a step-by-step guide, refer to the following Get started with Amazon EC2 Linux instances documentation
Creating the IAM role
In order to send traces to X-Ray, the EC2 Instance requires permissions to call the following X-Ray APIs included in the AWSXRayDaemonWriteAccess AWS Managed Policy. We will also attach the AmazonSSMManagedInstanceCore policy to allow access to the EC2 Instance using AWS Session Manager.
See the below steps to create the required IAM role.
To do this
- Log in to the IAM console.
- On the left pane, Select Roles and then click Create Role.
- For the Trusted entity type select AWS service and for the Use case select EC2, select Next
- To add the required permission policies, search for AWSXRayDaemonWriteAccess and select it, Then search for and select AmazonSSMManagedInstanceCore and click Next
- Give your role a Name such as CWAgentTracingRole and Click Create Role
- Finally, attach the newly created role to the EC2 Instance. For a step-by-step guide on this, please refer to the Attaching an IAM role to an instance guide
Installing the CloudWatch Agent
The CloudWatch Agent can be installed on Linux, Windows, and other supported operating systems by downloading the agent package from Amazon Simple Storage Service (Amazon S3), using AWS Systems Manager, AWS CloudFormation, or by installing it manually using the command line.
With the agent in the Amazon Linux 2 repository, the agent package can be installed on Linux hosts in a single step using the yum package manager.
To do this
- Navigate to the EC2 Console and select the Instance.
- Next click Connect and select Session Manager -> Click Connect
- Once the session has started, run the following command to download and install the CloudWatch Agent.
Configuring the CloudWatch Agent
The CloudWatch Agent configuration file is a JSON File with four sections which include agent, metrics, logs, and traces. Each of these perform a certain function but for the purpose of this demo, the focus will be on the following:
- The agent section, which includes fields for the overall configuration of the agent.
- The traces section, which specifies the sources for traces that are collected and sent to AWS X-Ray.
In order to send traces to X-Ray, the agent needs to be configured appropriately. The agent configuration can be generated either manually or by using the agent wizard.
For the purpose of this post, a manual configuration will be carried out. We recommend that you name the configuration file the following and place it in the below location if running on a Linux Machine for troubleshooting purposes.
/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json
If using Windows OS, give it the following name in the following location:
$Env:ProgramData\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent.json
To do this
- Using the same Session Manager Session, run the following command:
2. Copy and paste the following JSON to the CloudWatch Agent configuration file
{
"agent": {
"metrics_collection_interval": 60,
"run_as_user": "root"
},
"traces": {
"traces_collected": {
"xray": {},
"otlp": {}
}
}
}
3. Save the config and exit the editor using the keyboard shortcuts ‘CTRL + O’ & ‘CTRL + X’
In order to start the agent using the above configuration file, the agent must be started by appending the -a fetch-config
option, this causes the agent to load the latest version of the CloudWatch agent configuration file and the -s
option starts the agent. It also requires a reference path to the JSON file created in the above section file:<configuration-file-path>
Run the following to perform this translation and to start the agent using this configuration;
This will validate the config file provided and will start the agent. To confirm this, run the following:
Figure 2 : Terminal showing the agent status
We can also now confirm that our Instance is now LISTENing on the default ports for incoming traces either via the X-Ray SDK or via OTLP.
Run the following command to confirm this:
Figure 3 : Linux Terminal showing current Listening Ports
- For OTLP
- Calls made via grpc will be sent to port 4317
- Calls made via http will be sent to port 4318
- For the X-Ray SDKs
- Calls via X-Ray SDKs will be made to port 2000
Trace data will be sent to the agent on these ports, the agent will collect the raw segment/span data, it will process and export these traces to AWS X-Ray.
Installing the sample Python application
In order to generate OpenTelemetry traces, we will need to create a sample application that will perform instrumentation on our behalf and will send these traces to the CloudWatch Agent acting as our collector. We will utilize the following Python app located here in the aws-observability repo, which also contains many more language specific sample applications for testing.
To do this
- On your EC2 Instance, create a local file named
python-app.sh
- Paste in the following bash script and save it
#!/bin/bash
echo -e 'Installing Git... \n'
sudo yum install git -y
echo -e 'Installing Pip... \n'
sudo yum install pip -y
echo -e 'Cloning the GitHub Repo... \n'
git clone https://github.com/aws-observability/aws-otel-community.git
echo -e 'Creating a virtual environment... \n'
python3 -m venv ./
echo -e 'Activating the virtual environment... \n'
source bin/activate
echo -e 'Changing the directory... \n'
cd aws-otel-community/sample-apps/python-manual-instrumentation-sample-app
echo -e 'Installing the requirements... \n'
pip install --no-cache-dir -r requirements.txt
echo -e 'Starting the Python Application... \n'
python app.py
The above will install git, pip, the required github repo, it will also create a Python virtual environment and the required dependencies, finally it will start the Python application
3. Run sudo chmod u+x python-app.sh
4. Then to start the application, run sudo ./python-app.sh
Once the installation is complete, an output similar to the below should be available in the output:
Figure 4 : Linux Terminal showing the running application
Now that the sample application is running, we can tell application to generate and emit Traces to X-Ray.
Sending OpenTelemetry Traces to X-Ray
OpenTelemetry or also referred to as OTeL, is an open-source observability framework. It is a collection of APIs, SDKs and tools used to instrument, generate, collect, and export telemetry data which can help to analyze your systems’ behavior and performance. It’s important, as it standardizes the way telemetry data is collected and transmitted. It offers a consistent experience and streamlined observability to achieve business goals.
The agent is responsible for receiving data via gRPC or HTTP using the OpenTelemetry protocol (OTLP). Once received, OpenTelemetry spans are converted to X-Ray Segments and are passed to X-Ray using the PutTraceSegments API.
To do this, Navigate back to the EC2 Console and using the same steps previously, create a new Session Manager Session to the EC2 Instance. Run the first three commands in order to make a HTTP request to the local HTTP endpoint;
curl http://127.0.0.1:8080/
- This ensures the application is running
curl http://127.0.0.1:8080/outgoing-http-call
- This makes a HTTP request to aws.amazon.com (http://aws.amazon.com/)
curl http://127.0.0.1:8080/outgoing-sampleapp
- Finally, this makes a call to all other sample app ports configured at <host>:<port>/outgoing-sampleapp. If none are available, it makes a HTTP request to www.amazon.com (http://www.amazon.com/)
We can confirm that the requests were successful where we have also been provided with trace-ids for the respective request. Note: Each traceId is unique and it connects all segments and subsegments originating from a single client request, when the below requests are carried out, a unique traceId will be generated as a result. See the X-Ray Segment Documentation for more information.
To view these traces, copy the traceId from the Session Manager output and navigate to the X-Ray Trace Map via the AWS Console. The generated trace’s will be present permitting that the correct region is selected from the console.
Note: You can also adjust the time range for the Trace Map by specifying the Absolute or Relative Timeframe.
The X-Ray trace map is a visual representation of the trace data that’s generated by instrumented applications. The map shows service nodes that served requests, upstream client nodes that represent the origin of the requests and the downstream service nodes which represent web services and resources that are used by an application for further processing.
Figure 5 : AWS X-Ray Trace Map showing the Service Nodes
Navigate to the traces section and click on the trace that was generated.
Figure 6 : AWS X-Ray Trace Console showing traced requests
From here, details on the trace id, timestamp of the call, the response code, response time, duration, HTTP method and the URL address will be present for further observability.
Figure 7 : AWS X-Ray Trace Console showing traced requests
Although these calls were successful, AWS X-Ray is extremely useful for investigating ongoing issues. So how does this look when viewed on the trace map?
Utilizing the same SSM Session that was used to call the HTTP Endpoint, run the following command which makes a call to the AWS S3 Service to list buckets for the associated account.
An exception similar to the below should be present;
From the response of the above call, it’s clear that the application has encountered an issue while calling the S3 Service. Navigate to the Trace Map and locate the node that made the call (highlighted by the red outline on the node) which indicates that an error has occurred.
Figure 8 : AWS X-Ray Trace Map showing traced requests with the failed request
Next, navigate to Traces section and search for the traceId which generated the error, it will be annotated with a Fault (5XX) for the trace status. Once the trace is located and selected, additional details on the HTTP Error Code generated in response to calling the S3 ListBucket API will be provided.
Figure 9 : AWS X-Ray Trace Console showing the failed request
Within the trace, a HTTP 403 Code can be observed. A HTTP 403 is a common HTTP Error Code related to permissions or their lack of. If we remember at the start of this blog post, we provided permissions to our EC2 Instance using an Instance profile IAM Role where we only specified the AWSXRayDaemonWriteAccess and the AmazonSSMManagedInstanceCore AWS Managed Policy.
In order to resolve this, S3 Permissions will be required to call the API. Navigate back to the IAM Console and provide the same IAM Role created previously with the AmazonS3ReadOnlyAccess Policy and re-run the command.
After making the necessary IAM Changes to allow the Instance to call the S3 API, there will be confirmation that the call has been successful as per the Trace Map.
Figure 10 : AWS X-Ray Trace Console showing the successful traced request
Cleaning up
In order to avoid any unexpected charges, the instance utilized should be terminated if it was created strictly for testing purposes. To do so, terminate the EC2 Instance created before or during this demo via the Console or via the Command Line.
Conclusion
In this post, we observed how traces were emitted and exported to X-Ray from a sample application using a Python SDK with the CloudWatch Agent acting as the collector. Developers can utilize auto-instrumentation, which can be implemented to dynamically inject bytecode to capture telemetry data from many popular libraries and frameworks used to produce observability signals without the need for user intervention. Automatic Instrumentation using OpenTelemetry is supported for various coding languages such as C++, .NET, Java, JS, Python etc. This can also be achieved in container environments such as AWS EKS and ECS. X-Ray natively integrates with many AWS services such as AWS Lambda, Amazon API Gateway, Elastic Load Balancing and much more, see our documentation for more information.
See the Amazon CloudWatch Features page to learn more, and for more hands-on experience, check out the One Observability Workshop and our GitHub Repo for more sample apps to help you get started.