AWS Cloud Operations Blog
Use AWS Systems Manager Automation runbooks to resolve operational tasks
OpsCenter provides a central location where operations engineers and IT professionals can view, investigate, and resolve operational work items (OpsItems) related to AWS resources.
AWS Systems Manager Automation simplifies common maintenance and deployment tasks for Amazon Elastic Compute Cloud (Amazon EC2) instances and other AWS resources. You can use this capability to build automations to configure and manage instances and AWS resources. You can also create custom runbooks or use predefined runbooks maintained by AWS.
AWS Systems Manager Explorer is a customizable operations dashboard that reports information about your AWS resources. Explorer displays an aggregated view of operations data (OpsData) across your your AWS accounts and aegions. Explorer provides context into how operational issues are distributed, trend over time and vary by category.
In the first post in this series, Aggregate operational tasks with AWS Systems Manager Explorer and OpsCenter, we showed you how to:
- Set up Explorer and OpsCenter with Systems Manager Quick Setup.
- Create OpsItems from an Amazon CloudWatch alarm.
- Create OpsItems manually through OpsCenter.
In this blog post, we show you how to use Systems Manager Automation documents (runbooks) to resolve your operational tasks from OpsItems.
The following diagram shows the architecture of the solution.
Figure 1: Automate operational work items with AWS Systems Manager Explorer
Solution overview
In this post, we’ll show you how to perform the following steps:
- Set up a service role in AWS Identity and Access Management (IAM) to access Automation document workflows to remediate your OpsItems.
- Configure Automation runbooks to remediate and resolve OpsItems.
Prerequisites
Complete the steps in the first blog post, Aggregate operational tasks with AWS Systems Manager Explorer and OpsCenter.
After you complete those steps, you will have three OpsItems, as shown here. Two were created manually. One was created automatically through a CloudWatch alarm.
Figure 2: Open OpsItems in OpsCenter dashboard
Set up Automation
AWS provides a library of Automation documents that you can choose for a variety of operational tasks. You can build, run, and share automations with others on your team or inside your organization.
Figure 3 shows the Automation document categories for the automation of your operational tasks.
Figure 3: Automation documents
If your IAM user account, group, or role is assigned administrator permissions, then you have access to Systems Manager Automation. If you don’t have administrator permissions, then an administrator must give you permission by assigning the AmazonSSMFullAccess managed policy, or a policy that provides comparable permissions, to your IAM account, group, or role. The AmazonSSMFullAccess policy grants permissions to Systems Manager actions, but some runbooks require permissions to other services. For example, the AWS-ReleaseElasticIP runbook requires IAM permissions for ec2:ReleaseAddress. Review the actions taken in a runbook to ensure your IAM user account, group, or role is assigned the permissions required to perform those actions.
Automations can be initiated under the context of a service role (or assume role). This allows the service to perform actions on your behalf. If you do not specify an assume role, Automation uses the context of the user who invoked the automation. For information about creating a service role, see Use AWS CloudFormation to configure a service role for Automation or Use IAM to configure roles for Automation in the AWS Systems Manager User Guide.
In this post, we use an AWS CloudFormation template to set up an Automation service role.
- Download the AWS-SystemsManager-AutomationServiceRole.zip This folder includes the
AWS-SystemsManager-AutomationServiceRole.yaml
CloudFormation template file. - Sign in to the AWS CloudFormation console and choose Create Stack.
- In Specify template, choose Upload a template file.
- Choose
AWS-SystemsManager-AutomationServiceRole.yaml
and then choose Next.
Figure 4: Creating the Automation service role
- For the stack name, enter
automation-role
. - In Configure stack options, leave the defaults, and then choose Next.
- On the Review page, select I acknowledge that AWS CloudFormation might create IAM resources with custom names to create the CloudFormation stack.
Figure 5: Automation service role
- To get the ARN for the automation service role, choose the AutomationServiceRole Your ARN will be similar to
arn:aws:iam::<AccountID>:role/AutomationServiceRole
. where AccountID is your AWS Account ID. - Some runbooks require permissions to other services. In Figure 6, we add EC2, S3, and DynamoDB full access inline IAM policies to the Automation service role. If an administrator performs operational tasks using Automation, you can keep these full access IAM policies, but always check the Automation documents and provide only required policies to the Automation service role.
Figure 6: Automation service role inline IAM policies
You can now use the Automation service role ARN in your runbooks. For information about creating your own Automation runbooks, see the New Automation Features in AWS Systems Manager blog post.
Depending on your use case, you can run automations by using different security models or target and rate controls. You can run automation with approvers or you can run a manual automation.
In this post, we will show you how to remediate OpsItems through Automation runbooks.
Remediate OpsItems with Automation documents
- From the left navigation pane in the AWS Systems Manager console, choose OpsCenter.
Figure 7: Open OpsItems
- Choose the OpsItem ID for the CloudWatch alarm to initiate the remediation through Automation runbooks.
- On the details page for the alarm OpsItem, choose the AWS-ResizeInstance runbook, and then choose Execute.
Figure 8: CloudWatch alarm OpsItem details page
- To resolve the high CPU issue, change the instance type from t2.micro to t2.large. You can choose another size, as appropriate for your workload.
- In Run automation: AWS-ResizeInstance, enter the following values, and then choose Execute.
- For InstanceId, enter your web-1 EC2 instance ID.
- For InstanceType, choose large.
- For AutomationAssumeRole, choose AutomationServiceRole from the dropdown. This is the role you created as a part of Automation setup.
Figure 9: Run automation: AWS-ResizeInstance
- On the Automation executions page, you can confirm the runbook execution.
Figure 10: Automation executions page
- To remediate the RDS OpsItem, in the OpsItems list (Figure 7), choose the OpsItem ID for Create RDS snapshot for mysql database. On the details page, choose AWS-CreateRdsSnapshot, and then choose Execute.
Figure 11: RDS OpsItem details page
- In Run automation: AWS-CreateRdsSnapshot, enter the following values, and then choose Execute.
- For DBInstanceIdentifier, enter your RDS instance ID.
- For AutomationAssumeRole, choose AutomationServiceRole from the dropdown. This is the role you created as part of Automation setup.
Figure 12: Run automation: AWS-CreateRdsSnapshot
- To remediate the EC2 OpsItem, in the OpsItems list (Figure 7), choose the OpsItem ID for Create an image for web-1 EC2 instance. On the details page, choose AWS-CreateImage, and then choose Execute.
Figure 13: EC2 OpsItem details page
- In Run automation: AWS-CreateImage, enter the following values, and then choose Execute.
- For InstanceId, enter your EC2 instance ID.
- For AutomationAssumeRole, choose AutomationServiceRole from the dropdown. This is the role you created as part of Automation setup.
Figure 14: Run automation: AWS-CreateImage
You have now successfully remediated three OpsItems without writing any code.
Conclusion
In this blog post, we’ve shown you how to use Systems Manager Automation runbooks to resolve and remediate your OpsItems via the console . With the information in this post, you can create your own OpsItems and remediate your operational tasks. For more information about AWS Systems Manager features, see the AWS Systems Manager User Guide.
For information about how to remediate noncompliant AWS Config rules, see the Remediate noncompliant AWS Config rules with AWS Systems Manager Automation runbooks blog post.