Fine-grained Continuous Delivery With CodePipeline and AWS Step Functions

Automating your software release process is an important step in adopting DevOps best practices. AWS CodePipeline is a fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates. CodePipeline was modeled after the way that the retail website Amazon.com automated software releases, and many early decisions for CodePipeline were based on the lessons learned from operating a web application at that scale.

However, while most cross-cutting best practices apply to most releases, there are also business specific requirements that are driven by domain or regulatory requirements. CodePipeline attempts to strike a balance between enforcing best practices out-of-the-box and offering enough flexibility to cover as many use-cases as possible.

To support use cases requiring fine-grained customization, we are launching today a new AWS CodePipeline action type for starting an AWS Step Functions state machine execution. Previously, accomplishing such a workflow required you to create custom integrations that marshaled data between CodePipeline and Step Functions. However, you can now start either a Standard or Express Step Functions state machine during the execution of a pipeline.

With this integration, you can do the following:

· Conditionally run an Amazon SageMaker hyper-parameter tuning job

· Write and read values from Amazon DynamoDB, as an atomic transaction, to use in later stages of the pipeline

· Run an Amazon Elastic Container Service (Amazon ECS) task until some arbitrary condition is satisfied, such as performing integration or load testing

Example Application Overview

In the following use case, you’re working on a machine learning application. This application contains both a machine learning model that your research team maintains and an inference engine that your engineering team maintains. When a new version of either the model or the engine is released, you want to release it as quickly as possible if the latency is reduced and the accuracy improves. If the latency becomes too high, you want the engineering team to review the results and decide on the approval status. If the accuracy drops below some threshold, you want the research team to review the results and decide on the approval status.

This example will assume that a CodePipeline already exists and is configured to use a CodeCommit repository as the source and builds an AWS CodeBuild project in the build stage.

The following diagram illustrates the components built in this post and how they connect to existing infrastructure.

Architecture Diagram for CodePipline Step Functions integration

First, create a Lambda function that uses Amazon Simple Email Service (Amazon SES) to email either the research or engineering team with the results and the opportunity for them to review it. See the following code:

import json
import os
import boto3
import base64

def lambda_handler(event, context):
    email_contents = """
    <html>
    <body>
    <p><a href="{url_base}/{token}/success">PASS</a></p>
    <p><a href="{url_base}/{token}/fail">FAIL</a></p>
    </body>
    </html>
"""
    callback_base = os.environ['URL']
    token = base64.b64encode(bytes(event["token"], "utf-8")).decode("utf-8")

    formatted_email = email_contents.format(url_base=callback_base, token=token)
    ses_client = boto3.client('ses')
    ses_client.send_email(
        Source='no-reply@example.com',
        Destination={
            'ToAddresses': [event["team_alias"]]
        },
        Message={
            'Subject': {
                'Data': 'PLEASE REVIEW',
                'Charset': 'UTF-8'
            },
            'Body': {
                'Text': {
                    'Data': formatted_email,
                    'Charset': 'UTF-8'
                },
                'Html': {
                    'Data': formatted_email,
                    'Charset': 'UTF-8'
                }
            }
        },
        ReplyToAddresses=[
            'no-reply+delivery@example.com',
        ]
    )
    return {}

To set up the Step Functions state machine to orchestrate the approval, use AWS CloudFormation with the following template. The Lambda function you just created is stored in the email_sender/app directory. See the following code:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  NotifierFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: email_sender/
      Handler: app.lambda_handler
      Runtime: python3.7
      Timeout: 30
      Environment:
        Variables:
          URL: !Sub "https://${TaskTokenApi}.execute-api.${AWS::Region}.amazonaws.com/Prod"
      Policies:
      - Statement:
        - Sid: SendEmail
          Effect: Allow
          Action:
          - ses:SendEmail
          Resource: '*'

  MyStepFunctionsStateMachine:
    Type: AWS::StepFunctions::StateMachine
    Properties:
      RoleArn: !GetAtt SFnRole.Arn
      DefinitionString: !Sub |
        {
          "Comment": "A Hello World example of the Amazon States Language using Pass states",
          "StartAt": "ChoiceState",
          "States": {
            "ChoiceState": {
              "Type": "Choice",
              "Choices": [
                {
                  "Variable": "$.accuracypct",
                  "NumericLessThan": 96,
                  "Next": "ResearchApproval"
                },
                {
                  "Variable": "$.latencyMs",
                  "NumericGreaterThan": 80,
                  "Next": "EngineeringApproval"
                }
              ],
              "Default": "SuccessState"
            },
            "EngineeringApproval": {
                 "Type":"Task",
                 "Resource":"arn:aws:states:::lambda:invoke.waitForTaskToken",
                 "Parameters":{  
                    "FunctionName":"${NotifierFunction.Arn}",
                    "Payload":{
                      "latency.$":"$.latencyMs",
                      "team_alias":"engineering@example.com",
                      "token.$":"$$.Task.Token"
                    }
                 },
                 "Catch": [ {
                    "ErrorEquals": ["HandledError"],
                    "Next": "FailState"
                 } ],
              "Next": "SuccessState"
            },
            "ResearchApproval": {
                 "Type":"Task",
                 "Resource":"arn:aws:states:::lambda:invoke.waitForTaskToken",
                 "Parameters":{  
                    "FunctionName":"${NotifierFunction.Arn}",
                    "Payload":{  
                       "accuracy.$":"$.accuracypct",
                       "team_alias":"research@example.com",
                       "token.$":"$$.Task.Token"
                    }
                 },
                 "Catch": [ {
                    "ErrorEquals": ["HandledError"],
                    "Next": "FailState"
                 } ],
              "Next": "SuccessState"
            },
            "FailState": {
              "Type": "Fail",
              "Cause": "Invalid response.",
              "Error": "Failed Approval"
            },
            "SuccessState": {
              "Type": "Succeed"
            }
          }
        }

  TaskTokenApi:
    Type: AWS::ApiGateway::RestApi
    Properties: 
      Description: String
      Name: TokenHandler
  SuccessResource:
    Type: AWS::ApiGateway::Resource
    Properties:
      ParentId: !Ref TokenResource
      PathPart: "success"
      RestApiId: !Ref TaskTokenApi
  FailResource:
    Type: AWS::ApiGateway::Resource
    Properties:
      ParentId: !Ref TokenResource
      PathPart: "fail"
      RestApiId: !Ref TaskTokenApi
  TokenResource:
    Type: AWS::ApiGateway::Resource
    Properties:
      ParentId: !GetAtt TaskTokenApi.RootResourceId
      PathPart: "{token}"
      RestApiId: !Ref TaskTokenApi
  SuccessMethod:
    Type: AWS::ApiGateway::Method
    Properties:
      HttpMethod: GET
      ResourceId: !Ref SuccessResource
      RestApiId: !Ref TaskTokenApi
      AuthorizationType: NONE
      MethodResponses:
        - ResponseParameters:
            method.response.header.Access-Control-Allow-Origin: true
          StatusCode: 200
      Integration:
        IntegrationHttpMethod: POST
        Type: AWS
        Credentials: !GetAtt APIGWRole.Arn
        Uri: !Sub "arn:aws:apigateway:${AWS::Region}:states:action/SendTaskSuccess"
        IntegrationResponses:
          - StatusCode: 200
            ResponseTemplates:
              application/json: |
                {}
          - StatusCode: 400
            ResponseTemplates:
              application/json: |
                {"uhoh": "Spaghetti O's"}
        RequestTemplates:
          application/json: |
              #set($token=$input.params('token'))
              {
                "taskToken": "$util.base64Decode($token)",
                "output": "{}"
              }
        PassthroughBehavior: NEVER
        IntegrationResponses:
          - StatusCode: 200
      OperationName: "TokenResponseSuccess"
  FailMethod:
    Type: AWS::ApiGateway::Method
    Properties:
      HttpMethod: GET
      ResourceId: !Ref FailResource
      RestApiId: !Ref TaskTokenApi
      AuthorizationType: NONE
      MethodResponses:
        - ResponseParameters:
            method.response.header.Access-Control-Allow-Origin: true
          StatusCode: 200
      Integration:
        IntegrationHttpMethod: POST
        Type: AWS
        Credentials: !GetAtt APIGWRole.Arn
        Uri: !Sub "arn:aws:apigateway:${AWS::Region}:states:action/SendTaskFailure"
        IntegrationResponses:
          - StatusCode: 200
            ResponseTemplates:
              application/json: |
                {}
          - StatusCode: 400
            ResponseTemplates:
              application/json: |
                {"uhoh": "Spaghetti O's"}
        RequestTemplates:
          application/json: |
              #set($token=$input.params('token'))
              {
                 "cause": "Failed Manual Approval",
                 "error": "HandledError",
                 "output": "{}",
                 "taskToken": "$util.base64Decode($token)"
              }
        PassthroughBehavior: NEVER
        IntegrationResponses:
          - StatusCode: 200
      OperationName: "TokenResponseFail"

  APIDeployment:
    Type: AWS::ApiGateway::Deployment
    DependsOn:
      - FailMethod
      - SuccessMethod
    Properties:
      Description: "Prod Stage"
      RestApiId:
        Ref: TaskTokenApi
      StageName: Prod

  APIGWRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: "Allow"
            Principal:
              Service:
                - "apigateway.amazonaws.com"
            Action:
              - "sts:AssumeRole"
      Path: "/"
      Policies:
        - PolicyName: root
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action: 
                 - 'states:SendTaskSuccess'
                 - 'states:SendTaskFailure'
                Resource: '*'
  SFnRole:
    Type: "AWS::IAM::Role"
    Properties:
      AssumeRolePolicyDocument:
        Version: "2012-10-17"
        Statement:
          - Effect: "Allow"
            Principal:
              Service:
                - "states.amazonaws.com"
            Action:
              - "sts:AssumeRole"
      Path: "/"
      Policies:
        - PolicyName: root
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action: 
                 - 'lambda:InvokeFunction'
                Resource: !GetAtt NotifierFunction.Arn

After you create the CloudFormation stack, you have a state machine, an Amazon API Gateway REST API, a Lambda function, and the roles each resource needs.

Your pipeline invokes the state machine with the load test results, which contain the accuracy and latency statistics. It decides which, if either, team to notify of the results. If the results are positive, it returns a success status without notifying either team. If a team needs to be notified, the Step Functions asynchronously invokes the Lambda function and passes in the relevant metric and the team’s email address. The Lambda function renders an email with links to the pass/fail response so the team can choose the Pass or Fail link in the email to respond to the review. You use the REST API to capture the response and send it to Step Functions to continue the state machine execution.

The following diagram illustrates the visual workflow of the approval process within the Step Functions state machine.

StepFunctions StateMachine for approving code changes

After you create your state machine, Lambda function, and REST API, return to CodePipeline console and add the Step Functions integration to your existing release pipeline. Complete the following steps:

On the CodePipeline console, choose Pipelines.
Choose your release pipeline.
Choose Edit.
Under the Edit:Build section, choose Add stage.
Name your stage Release-Approval.
Choose Save.
You return to the edit view and can see the new stage at the end of your pipeline.
In the Edit:Release-Approval section, choose Add action group.
Add the Step Functions StateMachine invocation Action to the action group. Use the following settings:
1. For Action name, enter CheckForRequiredApprovals.
2. For Action provider, choose AWS Step Functions.
3. For Region, choose the Region where your state machine is located (this post uses US West (Oregon)).
4. For Input artifacts, enter BuildOutput (the name you gave the output artifacts in the build stage).
5. For State machine ARN, choose the state machine you just created.
6. For Input type¸ choose File path. (This parameter tells CodePipeline to take the contents of a file and use it as the input for the state machine execution.)
7. For Input, enter results.json (where you store the results of your load test in the build stage of the pipeline).
8. For Variable namespace, enter StepFunctions. (This parameter tells CodePipeline to store the state machine ARN and execution ARN for this event in a variable namespace named StepFunctions. )
9. For Output artifacts, enter ApprovalArtifacts. (This parameter tells CodePipeline to store the results of this execution in an artifact called ApprovalArtifacts. )
Choose Done.
You return to the edit view of the pipeline.
Choose Save.
Choose Release change.

When the pipeline execution reaches the approval stage, it invokes the Step Functions state machine with the results emitted from your build stage. This post hard-codes the load-test results to force an engineering approval by increasing the latency (latencyMs) above the threshold defined in the CloudFormation template (80ms). See the following code:

{
  "accuracypct": 100,
  "latencyMs": 225
}

When the state machine checks the latency and sees that it’s above 80 milliseconds, it invokes the Lambda function with the engineering email address. The engineering team receives a review request email similar to the following screenshot.

review email

If you choose PASS, you send a request to the API Gateway REST API with the Step Functions task token for the current execution, which passes the token to Step Functions with the SendTaskSuccess command. When you return to your pipeline, you can see that the approval was processed and your change is ready for production.

Approved code change with stepfunction integration

Cleaning Up

When the engineering and research teams devise a solution that no longer mixes performance information from both teams into a single application, you can remove this integration by deleting the CloudFormation stack that you created and deleting the new CodePipeline stage that you added.

Conclusion

For more information about CodePipeline Actions and the Step Functions integration, see Working with Actions in CodePipeline.

AWS DevOps & Developer Productivity Blog

Fine-grained Continuous Delivery With CodePipeline and AWS Step Functions

Example Application Overview

Cleaning Up

Conclusion

Resources

Follow