AWS Cloud Operations Blog

Enable cloud operations workflows with generative AI using Agents for Amazon Bedrock and Amazon CloudWatch Logs

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

Agents for Amazon Bedrock helps you accelerate generative artificial intelligence (AI) application development by automatically orchestrating multistep tasks. Bedrock Agents extend FMs in Bedrock to run complex business tasks—from booking travel and processing insurance claims to creating ad campaigns and managing inventory—all without writing any code.

Amazon CloudWatch Logs enables you to centralize the logs from all of your systems, applications, and AWS services that you use, in a single, highly scalable service. Use CloudWatch Logs to monitor applications and systems using log data or search them for specific error codes or patterns, filter them based on specific fields, or archive them securely for future analysis.

In this blog post we demonstrate the use of generative artificial intelligence (AI) with Amazon Bedrock agents and FMs in Bedrock in a cloud operations scenario in AWS for triaging and subsequently resolving issues based on errors observed in application log files.

In our solution, the Amazon Bedrock Agent uses the reasoning capability of foundation models (FMs) to break down a user instruction requesting error resolution from application logs published to CloudWatch Logs into multiple steps. It uses the developer/analyst provided natural language instruction to create an orchestration plan and then carries out the plan by invoking relevant APIs and accessing an Amazon Bedrock Knowledge Base that involves drawing information from a vector data store (Amazon OpenSearch Serverless) to augment the responses generated by Large Language Models (LLMs).

We also show the trace that demonstrates the chain of thought where the Bedrock agent automatically creates a plan and reasons out execution steps to fulfill the request from the natural language question posed by a support analyst tasked with resolving an application error.

Prerequisites

  1. Install AWS SAM.
  2. Clone the repository for this solution:
    sudo yum install -y unzip
    git clone https://github.com/aws-samples/genai-bedrock-serverless.git
    cd cloudops
  3. From the cloudops folder, deploy the SAM template for the solution:
    sam build -t template.yaml
    sam deploy --resolve-s3 --stack-name <anyname> --capabilities  CAPABILITY_NAMED_IAM
  4. The template creates 2 Amazon S3 buckets. Navigate to the output section of the deployed SAM template on the AWS CloudFormation console to obtain the names of these 2 S3 buckets (ProductDocsBucket and CloudOpsSupportBucket) so that you can locate them on the S3 console.
    1. Upload the ProductErrorCodes.xlx file from the data folder in our solution to the ProductDocsBucket bucket in the S3 console.
    2. Upload the cloudopsupport.json and the applogs.csv files from the data folder in our solution to the CloudOpsSupportBucket bucket in the S3 console.
  5. Create an Amazon Bedrock knowledge base. Follow the steps here to create a knowledge base. Accept all the defaults including using the Quick create a new vector store option in Step 7 that creates an Amazon OpenSearch Serverless vector search collection as your knowledge base. Configure the following areas specific to our use case in the solution:
    1. In Step 4a provide an optional description for your knowledge base such as “Provides error resolution based on error description of the error
    2. In Step 5c where you need to provide the S3 URI of the object containing the files for the data source for the knowledge base, select the S3 URI of the ProductDocsBucket

Solution Overview

In our solution, we start with a custom application running on an Amazon EC2 instance or on an instance outside AWS (on-premise or hybrid cloud) where an Amazon CloudWatch Agent installed on the instance streams the application’s log file to Amazon CloudWatch Logs. Here’s detailed documentation on installing the unified CloudWatch Logs agent on your Windows or Linux systems. You can also use AWS Systems Manager to install and update the CloudWatch Agent on your EC2 or hybrid instances. Once you have your application logs streamed into CloudWatch, you can then export your log file to Amazon S3 by following the steps documented here. Our solution assumes that you have completed the CloudWatch setup for your custom application and we have provided you a sample application log file (in .csv format) that you uploaded to S3 in the pre-requisites section.

In our scenario, the support analyst is looking to resolve an error based on an HTTP error code and timestamp of the error provided by the application. In order for the Bedrock agent to fulfill the user request, we configure the agent with 1/ an action group that defines an API schema for actions and an AWS Lambda function that implements the actions that the agent can perform and 2/ a knowledge base that is basically an AWS managed vector database (Amazon OpenSearch Serverless in our case) that provides a repository of information that the agent can query to answer customer queries and improve its generated responses. The agent creates prompts and determines the right sequence of tasks based on your instruction to the agent and the API schema and knowledge base provided to the agent.

The high-level architecture diagram as shown in Figure 1 illustrates the various components of our solution working together. A CloudWatch Logs file is exported to Amazon S3 and the agent is provided with an API schema and a Lambda function that implements the methods in the schema. The agent is also associated with a Bedrock knowledge base as shown in the architecture diagram and the agent then executes the flow as shown in Figure 2.

Figure 1: Solution Architecture depicting end to end flow

Figure 2: Flowchart describing end to end flow

Setup

Create an Amazon Bedrock Agent. Follow the steps here from the Bedrock Agents console to create your Bedrock agent. Accept all the defaults except in the following areas that are specific to the configuration for our solution. In the Configure your agent section:

  1. In Step 2c to Select a Model, select the Anthropic Claude 3 Sonnet model. In Step 2d, under Instructions for the agent, provide the following instruction “You are an agent that provides error resolution and affected application component information based on the HTTP error code and timestamp of the error” as shown belowFigure 3: Create Bedrock Agent
  2. In Step 2g in the IAM permissions sections for the Agent resource role, select Use an existing service role and select the ‘AmazonBedrockExecutionRoleForAgents_CloudOps’ IAM service role that has been provisioned for you by the solution’s SAM template.
    1. In Step 3 to add an action group to the Bedrock Agent, follow the steps here to add an action group from the console. Provide an optional description for the action group similar to “Provide error description for this error based on HTTP error code and timestamp of the error” as shown in Figure 4.
    2. In step 6 within the Action group type section select Define with API schemas as shown in Figure 4:Figure 4: Create Action Group
  3. In step 7 within the Action group invocation section, Select an existing Lambda function and select the Lambda function prefixed with <stackname>-CloudOpsSupportLambda that is already provisioned for you as shown below: Figure 5: Associate Lambda function to Action Group
  4. In step 8 under the Action group schema section, Select an existing API schema, select the Browse S3 button and select the cloudsopssupport.json file from the CloudOpsSupportBucket S3 bucket provisioned in your account as shown in Figure 6:Figure 6: Associate API schema to Action Group
  5. In Step 4 in the Knowledge base section, select Add to associate your knowledge group that you created in the pre-requisites section with your agent. Provide a Knowledge base instruction for the agent that is similar to “Provide error resolution and affected application component based on the error description for this error” as shown in Figure 7Figure 7: Add knowledge base to agent

Test and Validate

The Amazon Bedrock console provides a UI to test your agent. Follow the steps here to test and prepare your agent so that is ready to deploy. Follow the steps here to deploy your agent by creating an alias for the agent and an agent version that is associated to that alias as shown below:

Figure 8: Create alias and version for the Bedrock agent

You are now ready to test your agent. Provide a sample prompt in the Bedrock agent UI whereas a support analyst you are looking to resolve an error based on an HTTP error code and timestamp of the error provided by the application. You can simply use a prompt such as “Please provide error resolution for error with http error code of 500 and timestamp of 202404219:00” as shown in Figure 9 below based on sample error codes and associated timestamps in the sample application log file that we have provided in our solution. You should see that the Bedrock agent provides detailed error resolution where it obtained information to resolve the error as shown in Figure 9.

Figure 9: Provide prompt and obtain final response from the Bedrock Agent

By selecting Show trace for each of the responses, a dialog box shows the reasoning technique used by the agent and the final response generated by the FM.

Figure 10: View chain of thought and reasoning from the Bedrock agent

Cleanup

To avoid recurring charges, and to clean up your account after trying the solution outlined in this post, perform the following steps:

  1. From the agentsforbedrock-cloudops folder, delete the SAM template for the solution:
    sam delete --stack-name <yourstackname> --capabilities CAPABILITY_NAMED_IAM
  2. Delete the Amazon Bedrock Agent. From the Amazon Bedrock console, select the Agent you created in this solution, select Delete and follow the steps to delete the agent
  3. Delete the Amazon Bedrock knowledge base. From the Amazon Bedrock console, select the knowledge base you created in this solution, select Delete and follow the steps to delete the knowledge base

Conclusion

In this blog post we demonstrated the use of generative artificial intelligence (AI) with Amazon Bedrock agents and FMs in Bedrock and Amazon CloudWatch Logs in a cloud operations scenario in AWS. You can customize and extend this solution to adapt to your scenarios for triaging and subsequently resolving issues based on errors observed in application log files with multiple log sources, incorporating additional logic in the Lambda for the action groups or adding relevant information repositories to the knowledge base.

About the authors

Kanishk Mahajan is Principal, Solutions Architecture at AWS. He leads cloud transformation and solution architecture for ISV customers and partner at AWS. Kanishk specializes in containers, cloud operations, migrations and modernizations, AI/ML, resilience and security and compliance. He is a Technical Field Community (TFC) member in each of those domains at AWS.

Praveen Gudipudi is a Technical Account Manager at Amazon Web Services (AWS) with a strong passion for technology and machine learning. He excels at solving complex challenges and ensuring seamless cloud operations for AWS customers. Outside of work, Praveen enjoys diving into books and is an avid traveler, always eager to explore new destinations.