AWS Open Source Blog
Auto-instrumenting a Python application with an AWS Distro for OpenTelemetry Lambda layer
Customers want better insight into understanding the behavior of their systems, but not all customers can afford to make significant code changes in their existing pipelines to add more observability.
In this walkthrough, we explain how to get telemetry data from AWS Lambda Python functions, without having to change a line of code. Find the latest steps for this guide in the AWS Distro for OpenTelemetry Lambda documentation.
Walkthrough
Our example is simple, but simulates a real-life situation: A Lambda function is invoked and makes requests to downstream Amazon Web Services (AWS) using the AWS SDK for Python (Boto3).
The following system architecture diagram shows this scenario. The user invoking the Lambda function might be different—such as Amazon API Gateway or Amazon Simple Queue Service (Amazon SQS)—and the scenario much more complex, but as long as it goes through Lambda, we can trace its requests.
Lambda accepts the trigger, makes the requests, and finally shuts down.
How do we know whether it worked?
We could look at the response code, but that only tells us Lambda finished successfully. We could add logging to our function, but that is extra code to manage and not the easiest to read or maintain. Herein lies the great benefit of tracing with the AWS Distro for OpenTelemetry Python Lambda Layer. This layer adds both OpenTelemetry Python and a minimal version of the AWS Distro for OpenTelemetry Collector. Let’s update the diagram with this layer. The result is the solution we were looking for—auto instrumentation.
As the preceding image shows, adding a Lambda layer is all that is needed to get in-depth insights into the end-to-end operations of your system. This Lambda layer makes code available to a Lambda function without editing the Lambda handler.
With a few additional updates to the Lambda configuration, invoking the Lambda function results in a trace sent to AWS X-Ray. The trace includes a service map that provides the information that we were lacking before, including:
- Nodes represent the different services involved in completing the request.
- Each node contains metadata describing the environment at the time of the request.
- Time spent in each service is easily viewable.
- Information about failures is recorded.
- Traces are grouped together to produce stats based on response code.
- And much more.
With minimal configuration, the AWS Distro for OpenTelemetry Lambda layers can give us valuable telemetry data.
Creating a Lambda function
Let’s try this for ourselves and build this exact system. In our case, we’ll create a Lambda instance that runs Python 3.8; however, as of our September GA announcement, we also have solutions for OpenTelemetry tracing with the following languages:
- Python
- Java
- JavaScript
- .NET (currently requires modification to the Lambda function)
The required setup varies slightly per language, but the result is the same. Keep current with the latest languages and features supported on the upstream opentelemetry-lambda
repository.
Prerequisites
- An AWS account with permissions to create AWS Lambda functions, access AWS X-Ray, and create AWS Identity and Access Management (IAM) policies.
Steps
In the AWS Management Console, navigate to the Lambda console and select Create function.
Enter the function name OTel-Tracing-Demo, set the runtime as Python 3.8, and select the x86_64 architecture.
Because we want our Lambda function to call downstream services, expand the permissions section and choose to use an existing role. We already have a role called lambda-calls-aws-services-with-otel ready for this purpose.
You can create this role from the AWS Identity and Access Management (IAM) console. This role only has a single custom-made policy allowing the Lambda function to make calls to ec2:DescribeInstaces
and s3:ListAllMyBuckets
.
In the Lambda window, choose Create function to complete the creation. When the window opens to the Lambda code editor, you can replace the starter code with the following Lambda handler code. Make sure to select Deploy after any changes.
Because this code represents your own Lambda function, notice that this is the only time we must modify it. We won’t need to modify it at all to get telemetry data.
We now have all pieces of the system ready, prior to telemetry being added. This system currently works as intended, but doesn’t give us the actionable insight that could help us understand and identify areas of improvement for our Lambda function.
Adding the AWS Distro for OpenTelemetry Lambda layer to the Lambda function
Next, let’s add the telemetry. This guide follows the steps in the documentation for getting started for OpenTelemetry with Python Lambda.
According to the documentation, we must first add the AWS Distro for OpenTelemetry Lambda layer by using its Amazon Resource Name (ARN). Because Lambda Layers are Regional resources, the layer must exist in the same Region as your Lambda function. The table on this documentation page will show whether a Lambda layer exists in your Lambda function’s Region.
The Lambda layers on this page were packaged and published from the open source aws-otel-python
GitHub repository, which you can use to learn how to package a Lambda layer in another Region, if needed.
Our Lambda function was built in us-east-1, so we use this value to construct the ARN of the AWS Distro for OpenTelemetry Python Lambda layer we will use. Refer to the documentation to find the latest ARN for your Region.
To add this to the Lambda function, select Add a layer, which opens a new window.
Enter the ARN for the AWS Distro for OpenTelemetry Lambda layer in the same Region as your Lambda function. Notice that there are several AWS layers suggested, but we already know the ARN of our layer so we can directly enter it from the Specify an ARN option.
Select Add to add the layer to the Lambda function.
The next step in the documentation is to enable Active tracing in Lambda Configuration, Monitoring and operations tools.
Active tracing means that Lambda is actively sending telemetry data (traces) to X-Ray instead of just passing along instrumentation information to downstream services. This is what allows us to see nodes in the service map representing the Lambda service and our Lambda function.
Having those nodes gives us a more complete view of each service the request hits from end-to-end, rather than only getting the traces generated by OpenTelemetry (the downstream calls to Amazon Simple Storage Service [Amazon S3] and Amazon Elastic Compute Cloud [Amazon EC2]).
In AWS X-Ray, turn on Active tracing. You may notice a tip saying that your role is missing the permissions it needs to send traces to AWS X-Ray directly. Select Save and continue, then Lambda automatically will try to add the missing permissions onto the lambda-calls-aws-services-with-otel role created previously.
If we check on the lambda-calls-aws-services-with-otel role, we will notice that a new policy has been added with xray:PutTraceSegments and xray:PutTelemetryRecords privileges.
The AWS Distro for OpenTelemetry Lambda Layer is added, and active tracing is enabled.
Now we have the configuration and OpenTelemetry packages needed to start tracing. To start getting traces, we can import those packages directly into our Lambda handler and configure them. However, we’re looking for a solution that doesn’t need any code modification at all. For this purpose, we added a script directly into each layer, which does the configuration and that we can use by configuring environment variables.
Following the final step on the documentation, in the Lambda settings set the AWS_LAMBDA_EXEC_WRAPPER environment variable to point to the script we mentioned previously, /opt/otel-instrument.
Add these values and Save.
Select Edit to increase the timeout on the Lambda function to accommodate the multiple downstream requests we are making.
Nine seconds should be enough. After changing the Timeout option, select Save.
Now the Lambda function is set up with auto instrumentation. Whatever invokes this function now will result in automatically traced calls to numerous popular Python packages complete with the metadata from making those calls. Because the OpenTelemetry Collector is configured with the right exporters, these traces become available to view on the interactive AWS X-Ray backend.
Because OpenTelemetry is an open source project, its list of supported libraries continuously grows and improves. Refer to the full list of packages supported on the OpenTelemetry package registry or on the upstream OpenTelemetry Python repository directly.
Let’s test this instrumented Lambda function.
Viewing the result on AWS X-Ray
In the Code source window, we can choose Test, Configure test event to invoke this Lambda function.
Choose Configure test event, Create new test event. We won’t use the contents of the event in this example, so leave the default contents and use the event name MyTestEvent.
We’re not using the Lambda event in this demo, but the OpenTelemetry specifications for tracing on AWS Lambda give scenarios in which the Lambda event can be used to connect traces from services that invoked the Lambda function with traces subsequently created during the Lambda function execution.
Make sure to check out the upstream OpenTelemetry Lambda repository for the latest features included in each AWS Distro for OpenTelemetry Lambda Layer.
Finally, let’s invoke the Python Lambda function instrumented with OpenTelemetry Python. Choose on the Test button.
We confirm it succeeded by reviewing the log output.
But now we have something better than logs now! In the X-Ray console in the same Region, there is a trace matching this call on a recent Lambda function.
Choose this trace, which opens the AWS X-Ray trace detail view. Here we get the features of AWS X-Ray we expected at the start of this blog post.
Selecting the service map nodes highlights the corresponding traces in the timeline, and choosing the rows in the timeline gives us information about the metadata attached to each segment.
From this window, you can view how long requests took, what operation was requested using the AWS Python SDK (listBuckets
, describeInstances
), error messages, and more.
The service map and timeline provides the full view of the request without requiring us to read any code or design documents. As the following image shows, a client (us) triggered a request, the AWS Lambda service received the request, AWS Lambda routed that request to our Lambda function, our Lambda function’s operations got intercepted by the OTel-Tracing-Demo service (our Lambda running the OpenTelemetry Python SDK), and OpenTelemetry Python traced the Lambda function’s requests to s3:listbuckets
and ec2:describeInstances
. AWS Lambda sent traces for the first three nodes to AWS X-Ray, and the AWS Distro for OpenTelemetry Collector sent traces for the remaining three nodes.
As more requests are made, traces can be grouped by their URL and RESPONSE status code from the X-Ray window, which shows all traces. Traces can also be searched and filtered against a condition to sift through extensive telemetry data quickly. Learn more about using filter expressions to search for traces from the AWS documentation.
Customizing the OpenTelemetry Collector in the Lambda layer
The last step of this demo involves an optional modification to the configuration used by the AWS Distro for OpenTelemetry Collector in the Lambda layers. As the documentation mentions, although we supply a default configuration that tells the AWS Distro for OpenTelemetry Collector to export traces to AWS X-Ray, we can provide our own custom configuration for debugging purposes or to send the traces to a different backend entirely.
First, we will create a configuration.yaml
file in the Lambda function code editor view alongside the lambda_function.py
file.
In this example, we will remove the awsxray
exporter and replace it with the logging
exporter. Use the following code to fill the contents of collector.yaml
. We also use the debug
log level. Make sure to select Deploy once you do.
This is useful to debug whether you are getting the traces you expect in the Lambda function console output. Viewing traces confirms OpenTelemetry Python is patching the Lambda function calls and confirms it has a successful connection to the AWS Distro for OpenTelemetry Collector. Once that is confirmed, any subsequent missing trace data in AWS X-Ray can be narrowed down to an issue with the awsxray:
exporter being unable to send data to the AWS X-Ray service (possibly because of permissions issues, a mismatched Region, or something else).
Next, add one environment variable to tell the AWS Distro for OpenTelemetry Collector that we have our own configuration and where to find it. That is, we set OPENTELEMETRY_COLLECTOR_CONFIG_FILE
to /var/task/collector.yaml
.
With this environment variable set, we should once again manually invoke the Lambda function with the test event.
After that, we can review the console output, which shows the same traces produced by OpenTelemetry Python that we previously saw in AWS X-Ray.
AWS Lambda still sends its two traces (for the AWS Lambda service and the AWS Lambda function) to X-Ray directly because of active tracing, but the OpenTelemetry Python traces are formatted and printed to the console output. This is the same data that AWS X-Ray receives, except that AWS X-Ray makes the data searchable, groups them together, and presents it all in a user-friendly format.
After you have confirmed the traces you expect to see, you can decide whether the default AWS Distro for OpenTelemetry Collector is good enough for your system, or if this custom configuration would serve your system better.
What’s next?
The AWS Distro for OpenTelemetry team has enjoyed working with the open source upstream to help bring OpenTelemetry to its GA milestone in September 2021. In this demo, we walked through how to add OpenTelemetry features to an AWS Lambda function. We also mentioned the other languages and setups we are committed to supporting, and future plans. Many solutions require no changes to existing systems through the use of environment variables and scripts we maintain, and those that do require small changes are working toward providing automatic instrumentation solutions right now.
For all the latest on how to use OpenTelemetry with AWS, refer to the AWS Distro for OpenTelemetry documentation.
We are available on the CNCF Slack Workspace and regularly monitor discussions on our public AWS OpenTelemetry Community GitHub repository.