AWS Cloud Operations Blog

Automate remediation actions for Amazon EC2 notifications and beyond using EC2 Systems Manager Automation and AWS Health

You can use EC2 Systems Manager Automation to take remediation actions in response to events that may impact your AWS resources. To illustrate this concept, this post guides you through setting up automated remediation actions when an Amazon EBS backed Amazon EC2 instance is scheduled for retirement.

An instance is scheduled to be retired when AWS detects irreparable failure of the underlying hardware hosting the instance. If your instance root device is an Amazon EBS volume you can stop and start the instance at any time of your convenience before the retirement.

Amazon EC2 Systems Manager (SSM) Automation is an AWS-hosted service that simplifies common instance and system maintenance and deployment tasks at no additional cost.

AWS Health provides ongoing visibility into the state of your AWS resources, services, and accounts. The service gives you awareness and remediation guidance for resource performance or availability issues that may affect your applications that run on AWS.

Both services are integrated with Amazon CloudWatch Events, allowing AWS Health events to trigger SSM Automation documents.

SSM Automation also offers an Approval action which temporarily pauses an Automation execution until your designated principals (e.g. IAM user) either approve or reject the action. More information about SSM automated actions is available Systems Manager Automation Actions.

 

Figure 1: AWS Services feed events into AWS Health which triggers EC2 Systems Manager

 

This post will walk through the four steps to setup Stop and Start of EC2 instances using SSM Automation in response to EC2 retirement events from AWS Health. To launch the solution in the us-east-1 region using AWS CloudFormation please click here. Please change the region as required. We recommend reviewing the manual steps below before deploying the CloudFormation stack to have an understanding of the solution.

Step 1: Set up required AWS IAM role
Step 2: Set up the Amazon SNS Topic if you don’t have one already
Step 3: Set up the Amazon CloudWatch Events rule with the Automation document
Step 4: Test it out and approve the Automation

Setup Instructions

Step 1: Set up required IAM role

First setup the required IAM permissions for CloudWatch Events to use by creating an IAM policy and associating with an IAM role for CloudWatch. For the purpose of this example we will call the IAM role the AutomationCWRole. Here is an example of an IAM policy that could be used for this purpose:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ec2:StartInstances",
                "ec2:StopInstances",
		 "ec2:DescribeInstanceStatus"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssm:*"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "sns:Publish"
            ],
            "Resource": [
                "arn:aws:sns:*:*:Automation*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:PassRole"
            ],
            "Resource": "arn:aws:iam::<AccountId>:role/AutomationCWRole"
        }
    ]
}

Please make sure to update the role ARN which account Id and role name. You need to ensure that the role has events.amazonaws.com and ssm.amazonaws.com configured as a trusted entity for the IAM role as shown here:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "ssm.amazonaws.com",
          "events.amazonaws.com"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

More information about CloudWatch and IAM see Authentication and Access Control for Amazon CloudWatch. For more information about Systems Manager and IAM, see Configuring Access Using Systems Manager Managed Policies.

Step 2: Set up the Amazon SNS Topic if you don’t have one already

If you choose to use Automation Approval actions, then you will also need to create an SNS topic that the approval notification will be published to or use an existing one. You will also need to subscribe the approvers to that SNS topic. More information on how to set this up is available here.

We will use the SNS topic name AutomationStopStart for this example. Please note that the SNS Topic name must start with the Prefix: Automation.

Step 3: Set up the Amazon CloudWatch Events rule with the Automation document

First create a SSM Automation document named StopStartEC2InstancewithApproval by creating a json file using your preferred editor named “StopStartEC2InstancewithApproval.json”:

{
   "description":"Stop and Start EC2 instances(s) with Approval",
   "schemaVersion":"0.3",
   "assumeRole":"{{ AutomationAssumeRole }}",
   "parameters":{
      "AutomationAssumeRole":{
         "type":"String",
         "description":"The ARN of the role that allows Automation to perform the actions on your behalf.",
         "default":"arn:aws:iam::{{global:ACCOUNT_ID}}:role/AutomationServiceRole"
      },
      "InstanceIds":{
         "type":"String",
         "description":"EC2 Instance(s) to Stop and Start"
      },
      "Approvers":{
         "type":"StringList",
         "description":"IAM user or user arn of approvers for the automation action"
      },
      "SNSTopicArn":{
         "type":"String",
         "description":"The SNS topic ARN that you are using to get notifications on about EC2 retirement notifications. The SNS topic name must start with Automation."
      }
   },
   "mainSteps":[
      {
         "name":"approve",
         "action":"aws:approve",
         "timeoutSeconds":999999,
         "onFailure":"Abort",
         "inputs":{
            "NotificationArn":"{{ SNSTopicArn }}", 
            "Message": "Your approval is required to proceed with the stop and start of an EC2 instance using the EC2 systems manager automation document that is scheduled for retirement.",
            "MinRequiredApprovals":1,
            "Approvers":[
               "{{Approvers}}"
            ]
         }
      },
      {
         "name":"stopInstance",
         "action":"aws:changeInstanceState",
         "maxAttempts":2,
         "timeoutSeconds":120,
         "onFailure":"Continue",
         "inputs":{
            "InstanceIds":[
               "{{ InstanceIds }}"
            ],
            "DesiredState":"stopped"
         }
      },
      {
         "name":"forceStopInstance",
         "action":"aws:changeInstanceState",
         "maxAttempts":1,
         "timeoutSeconds":60,
         "onFailure":"Continue",
         "inputs":{
            "InstanceIds":[
               "{{ InstanceIds }}"
            ],
            "Force":true,
            "DesiredState":"stopped"
         }
      },
      {
         "name":"startInstance",
         "action":"aws:changeInstanceState",
         "maxAttempts":3,
         "timeoutSeconds":120,
         "onFailure":"Continue",
         "inputs":{
            "InstanceIds":[
               "{{ InstanceIds }}"
            ],
            "DesiredState":"running"
         }
      }
   ]
}

Then use the AWS CLI to create the SSM Automation document using the JSON file above:

[
aws ssm create-document --content file://StopStartEC2InstancewithApproval.json --name " StopStartEC2InstancewithApproval" --document-type "Automation"
]

More information about creating creating SSM documents can be found at Creating Systems Manager Documents.

You can then create the CloudWatch Events rule that will trigger the Automation document each time an EC2 retirement notification occurs. As an example you can use the following command using the AWS CLI:

aws events put-rule --name "EC2RetirementNotification" --event-pattern "{\"source\":[\"aws.health\"],\"detail-type\":[\"AWS Health Event\"],\"detail\":{\"service"\":[\"EC2\"],\"eventTypeCategory"\":[\"scheduledChange\"],\"eventTypeCode"\":[\"AWS_EC2_INSTANCE_RETIREMENT_SCHEDULED\"]}"}"

To set this up you can create a JSON file named targets.json using your preferred editor and then use that to create the CloudWatch Events target:

[
   {
      "Id":"1",
      "Arn":"arn:aws:ssm:<region>:<accountId>:automation-definition/AWS-StopStartEC2InstancewithApproval",
      "RoleArn":"arn:aws:iam::<accountId>:role/AutomationCWRole",
      "InputTransformer":{
         "InputPathsMap":{
            "Instances": "$.resources"
         },
         "InputTemplate": "{ \"AutomationAssumeRole\":[\"aws:iam::<accountId>:role/AutomationCWRole\"],\"Approvers\":[\"<IAMusername>\"],\"SNSTopicArn\":[\"arn:aws:sns:<region>:<accountId>:AutomationStopStart\"],\"InstanceIds\": <Instances> }"
      }
   }
]

Please update the region, accountId, SNS topic ARN, IAM role ARN and IAM username in the json file above per your requirements. The target in this case is the Automation document StopStartEC2InstancewithApproval which Stops and Starts the instance(s) provided.

Then use the AWS CLI to create the target specifying the json file you created:

aws events put-targets –rule EC2RetirementNotification –targets file://targets.json

Step 4: Test it out and approve the Automation

You can test against the document using direct inputs as well:

aws ssm start-automation-execution –document-name AmazonEC2InstanceStopStartwithApproval –parameters AutomationAssumeRole=”aws:iam::<AccountId>:role/AutomationCWRole”,Approvers=<IAMusername>,SNSTopicArn=”arn:aws:sns:us-east-1:<AccountId>:AutomationStopStart”,InstanceIds=<InstanceId>

You can get the execution status using the AutomationExecutionId returned from the command above: aws ssm  get-automation-execution –automation-execution-id <value>

Once you get the approval message published to your SNS topic’s subscribers, you can choose to approve or reject the action:

aws ssm send-automation-signal –automation-execution-id <automation-execution-id> –signal-type Approve –payload Comment=Replace_This_With_Approve_Comment

The automation can also be approved from the EC2 console in the Automation section:

Please note that the approval will trigger the stop and start of the EC2 Instance, regardless of the comments provided.

You can also skip the approval step and instead use the AmazonEC2InstanceStopStart SSM Automation document. Please note that in very rare situations EC2 instances might not stop even after a force stop; you should contact AWS support if that happens.

Conclusion

You can use EC2 Systems Manager Automation to take remediation actions on your AWS resources in response to events that may impact. You can take this example and apply it to other EC2 scheduled changes (e.g. system reboot maintenance) or any event with any AWS resource that may suit your needs. You can also use the document provided to Stop and Start EC2 instances in an automated way. We recommend tailoring it and testing for your use-case before deploying in a production environment.

About the Author

Tipu Qureshi is a principal engineer in the AWS support organization. He works with customers to implement automation, solve problems and setup new workloads on the AWS platform. He has created various architectures and certifications for cost-optimization and agility through DevOps.