AWS Cloud Operations Blog
Implementing an alarm to automatically detect drift in AWS CloudFormation stacks
AWS CloudFormation is a service that helps you model and implement your Infrastructure as Code (IaC). It provisions and configures cloud resources as described in template files that are written in JSON or YAML. After resources have been created with CloudFormation, it is possible for users to alter those resources via the AWS Management Console, the AWS Command Line Interface (AWS CLI), or the AWS SDK, which causes a drift in resources and stacks.
A resource is considered to have drifted if any of its actual property values differ from the expected property values, as defined in the stack template. A stack is considered to have drifted if one or more of its resources have drifted. A drift in CloudFormation can complicate stack updates and deletion operations, in addition to the risks associated with unmanaged configuration changes. Resolving drift helps to make sure your configurations remain consistent, and that future stack operations will succeed.
CloudFormation offers a drift detection feature to detect unmanaged configuration changes to stacks and resources. This will let you take corrective action to put the stack resources back in sync with their definitions in the stack template. To return a resource to compliance, the resource definition changes can be reverted directly. Alternatively, the changes can be retained by importing existing resources into a new stack or the existing stack can be destroyed and recreated with new resources.
In this post, I’ll show you how to deploy a solution in your AWS account that will provide a fully automated alarm to detect drift in CloudFormation stacks using AWS Config, Amazon EventBridge, and Amazon Simple Notification Service (Amazon SNS). Implementing the drift detection alarm enables timely alerting of stack drift, as opposed to manually triggering drift detection on each stack, to check whether a stack’s actual configuration has drifted from its expected configuration.
Solution overview
The solution provided in this post is applicable for implementing a drift detection alarm in both standalone and multi-account models.
- In a standalone account, all of the components (as seen on the right side of the solution architecture diagram) are deployed, thus making it a self-sufficient architecture.
- In a multi-account model, you have an option either to deploy this solution as a standalone implementation in each of the accounts (if required), or relay the events from tenant accounts into the Management account for central notification (as seen on the left side of the solution architecture diagram). Note that the Management account referred to here is the primary/delegated admin account from where you want the alerts to be processed and delivered. This solution assumes that the management account is in the same region as the tenant accounts. Policies will need amendment to cater to the cross-region use case.
This solution, which uses EventBridge (a serverless event bus that makes it easier to build event-driven architectures), allows for seamless integration with multiple targets, thereby catering to use cases where account specific actions on events are required in addition to central notification management. For example, Tenant Account-1 may prefer to route the events to an Amazon SQS queue or an AWS Lambda function, while Tenant Account-2 may prefer to route it to an AWS Systems Manager Automation for taking necessary action in response to the drift detection alert.
Based on your account setup, this solution will deploy some or all of the following resources into your AWS account(s) as a CloudFormation stack:
- AWS Config – An AWS Managed Config rule (cloudformation-stack-drift-detection-check) to evaluate drift in CloudFormation stacks. The rule and the stack are COMPLIANT when the stack drift status is IN_SYNC. The rule and the stack are NON_COMPLIANT when the stack drift status is DRIFTED or when CloudFormation failed to detect drift (default).
- Amazon EventBridge – 1. An EventBridge rule to match compliance change events from the AWS Config rule, transform the received input, and then fan out the alert to the specified target(s). 2. An EventBusPolicy to allow for the relay of events from the tenant account into the management account’s EventBridge.
- Amazon SNS – 1. A Topic and Policy to allow EventBridge to publish the alert. 2. A Subscription to the topic to get notified.
- AWS Key Management Service (AWS KMS) – A Customer-Managed Key and Policy for Server-side encryption of the SNS topic. Optionally, you can use any existing key in your account with permission for the “events.amazonaws.com” ServicePrincipal to decrypt and generate data key.
- AWS Identity and Access Management (IAM) – Roles and Policies to 1. Allow AWS Config to access CloudFormation drift detection. 2. Enable cross-account eventbus access in the tenant account.
The solution works as follows:
- AWS Config triggers the evaluation when any resource that matches the rule’s scope (currently set to “AWS::CloudFormation::Stack”) changes in configuration and at the frequency (“MaximumExecutionFrequency” parameter) that you specify at the time of this solution deployment.
- EventBridge receives the events from AWS Config, applies the EventBridge rule to match the compliance change event, and transforms the input (customize the text) as defined in the “InputTransformer” template.
- The chosen customer-managed KMS Key is then accessed to encrypt the notification.
- The encrypted notification is published to the target SNS topic.
- The endpoints subscribed to this topic start receiving the published messages.
Note that in the case of Standalone/Management account the matching events are transformed within EventBridge and published to Amazon SNS in the same account for delivery. However, in the case of tenant accounts, the events are relayed to the Management account for central management of alerts.
Prerequisites
To build the solution outlined in this post, you’ll need the following:
- AWS account(s).
- AWS Config enabled in all of the accounts and regions that you intend to deploy this solution.
- AWS Organizations set up in case of Multi-Accounts with Management-Tenant model.
- Access to CloudFormation and permissions to deploy a CloudFormation stack with the ability to create the following resources: AWS Config – ConfigRule, Amazon EventBridge – EventBusPolicy and Rule, Amazon SNS – Topic, TopicPolicy and Subscription, AWS KMS – Key and KeyPolicy, and AWS IAM – Role and Policy.
Code
Deployment Steps
Now, let’s deploy the solution and see how it works:
Step 1: Download the CloudFormation template (cfn_dda.yaml).
Step 2: “Create stack with new resources” in CloudFormation in your preferred region and upload the template file.
Step 3: “Specify stack details” (Stack name and Parameters) as per your account model and create the stack.
The parameters that you must provide vary depending on how you intend to configure this account (and region). It can be one of the following:
- A Standalone account – deployed with key components (aforementioned) to make it a self-sufficient solution to detect and alert drift. The “Principal Org ID” and “Management Account Id” parameters are not required. Leave it blank.
- A Management account – in addition to a Standalone account, an EventBusPolicy is deployed to allow events from the tenant account into the management account’s EventBridge. Provide all of the parameters.
- A Tenant account – only AWS Config and EventBridge rules with associated IAM Roles and Policies to detect drift and relay it to the management account are deployed. Fill the “Generic Parameters” section only.
Make sure to add permission in your KMS KeyPolicy for “events.amazonaws.com” ServicePrincipal to decrypt and generate data key if you want to use any existing key in your account.
Note that in this example I am deploying this solution to a Standalone account and subscribed to receive email alerts. You also have the option to subscribe to other SNS supported Protocols and Endpoints.
Step 4: Wait for the status of the stack to change to “CREATE_COMPLETE”.
Step 5: Make sure to confirm the subscription (when required). For example, in case of email, you must confirm the subscription by selecting or visiting the link sent to the email Id that you provided.
This solution can also be deployed from an AWS CLI or your preferred CloudFormation deployment methods. Alternatively, If you intend to deploy this solution across multiple AWS accounts and/or AWS Regions, then use CloudFormation StackSets to Provision Resources Across Multiple AWS Accounts and Regions.
Sample output
Perform the below actions in order to test the working of CloudFormation Drift Detection Alarm:
- Choose an existing stack in the region in which you have deployed this solution.
- If you don’t have an existing stack, create one using any available CloudFormation template that you may have. Alternatively, you can use one of these sample templates.
- Intentionally alter the configuration on any resource in this stack outside CloudFormation (say, from the AWS Console) to introduce a drift.
- Wait for the evaluation of drift in your stack and the notification to be sent out.
I have manually edited the configuration of one of the resources in my stack from the AWS Console and the following is an email/alert received as a result of drift:
Note that currently the alert is sent whenever the status of the AWS Config rule evaluation changes to NON_COMPLIANT. To receive alerts only when the status changes from COMPLIANT to NON_COMPLIANT, amend the EventBridge rule to include “complianceType” of the “oldEvaluationResult” as COMPLIANT.
Enhancements
Listed here are some of the enhancements that you could add to this solution for your specific use case:
- Adjust the scope of drift detection – In this solution, drift is evaluated only on the resource type “AWS::CloudFormation::Stack”. However, you can adjust the scope to constrain the resources that you want the ConfigRule to trigger by including one or more resource types, a combination of a tag key and value, or a combination of one resource type and one resource ID.
- Add auto-remediation – AWS Config lets you remediate noncompliant resources using AWS Systems Manager Automation documents that define a set of actions to be performed. AWS Config provides a set of managed automation documents with remediation actions. You can also create and associate custom automation documents.
- Add targets to EventBridge – EventBridge opens up a vast array of integrations to seamlessly connect to multiple targets in parallel. You can add further targets to perform additional actions on events/alerts to cater to your needs.
- Extend SNS subscription – In this solution, the SNS Topic is limited to just one subscription, but any interested party who wish to receive the notification can go ahead and subscribe.
- Set up Aggregator – Set up an organization-wide aggregator in AWS Config using a delegated administrator account and exclude EventBridge (and its associated components) in the tenant accounts – if you require consistent action to be taken across all your accounts.
Limitations
Listed here are some of the limitations with CloudFormation drift detection:
- CloudFormation detects drift only on those AWS resources that support drift detection. Resources that don’t support drift detection are assigned the drift status of “NOT_CHECKED”.
- CloudFormation only determines drift for property values that are explicitly set, either through the stack template or by specifying template parameters. This doesn’t include default values for resource properties. To have CloudFormation track a resource property for the purposes of determining drift, explicitly set the property value, even if you’re setting it to the default value.
- In certain edge cases, CloudFormation may not be able to return accurate drift results, and thus requires manual interpretation of the results. I recommend checking out this troubleshooting guide to resolve drift detection errors.
Cleanup
If you don’t wish to use this solution, then delete any CloudFormation stack that you created as part of this solution to avoid additional associated infrastructure costs.
Conclusion
Stack drift in CloudFormation has become a common occurrence and it can result in unmanaged configuration of your resources as well as thwart attempts to update or delete the stacks. In this post, I demonstrated how you can deploy a drift detection alarm to automate the detection and notification of drift in CloudFormation stacks to help you take timely corrective actions.
About the author: