Testing and creating CI/CD pipelines for AWS Step Functions

AWS Step Functions allow users to easily create workflows that are highly available, serverless, and intuitive. Step Functions natively integrate with a variety of AWS services including, but not limited to, AWS Lambda, AWS Batch, AWS Fargate, and Amazon SageMaker. It offers the ability to natively add error handling, retry logic, and complex branching, all through an easy-to-use JSON-based language known as the Amazon States Language.

AWS CodePipeline is a fully managed Continuous Delivery System that allows for easy and highly configurable methods for automating release pipelines. CodePipeline allows the end-user the ability to build, test, and deploy their most critical applications and infrastructure in a reliable and repeatable manner.

AWS CodeCommit is a fully managed and secure source control repository service. It eliminates the need to support and scale infrastructure to support highly available and critical code repository systems.

This blog post demonstrates how to create a CI/CD pipeline to comprehensively test an AWS Step Function state machine from start to finish using CodeCommit, AWS CodeBuild, CodePipeline, and Python.

CI/CD pipeline steps

The pipeline contains the following steps, as shown in the following diagram.

CI/CD pipeline steps

Pull the source code from source control.
Lint any configuration files.
Run unit tests against the AWS Lambda functions in codebase.
Deploy the test pipeline.
Run end-to-end tests against the test pipeline.
Clean up test state machine and test infrastructure.
Send approval to approvers.
Deploy to Production.

Prerequisites

In order to get started building this CI/CD pipeline there are a few prerequisites that must be met:

Create or use an existing AWS account (instructions on creating an account can be found here).
Define or use the example AWS Step Function states language definition (found below).
Write the appropriate unit tests for your Lambda functions.
Determine end-to-end tests to be run against AWS Step Function state machine.

The CodePipeline project

The following screenshot depicts what the CodePipeline project looks like, including the set of stages run in order to securely, reliably, and confidently deploy the AWS Step Function state machine to Production.

CodePipeline project

Creating a CodeCommit repository

To begin, navigate to the AWS console to create a new CodeCommit repository for your state machine.

CodeCommit repository

In this example, the repository is named CalculationStateMachine, as it contains the contents of the state machine definition, Python tests, and CodeBuild configurations.

CodeCommit structure

Breakdown of repository structure

In the CodeCommit repository above we have the following folder structure:

config – this is where all of the Buildspec files will live for our AWS CodeBuild jobs.
lambdas – this is where we will store all of our AWS Lambda functions.
tests – this is the top-level folder for unit and end-to-end tests. It contains two sub-folders (unit and e2e).
cloudformation – this is where we will add any extra CloudFormation templates.

Defining the state machine

Inside of the CodeCommit repository, create a State Machine Definition file called sm_def.json that defines the state machine in Amazon States Language.

This example creates a state machine that invokes a collection of Lambda functions to perform calculations on the given input values. Take note that it also performs a check against a specific value and, through the use of a Choice state, either continues the pipeline or exits it.

sm_def.json file:

{
  "Comment": "CalulationStateMachine",
  "StartAt": "CleanInput",
  "States": {
    "CleanInput": {
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke",
      "Parameters": {
        "FunctionName": "CleanInput",
        "Payload": {
          "input.$": "$"
        }
      },
      "Next": "Multiply"
    },
    "Multiply": {
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke",
      "Parameters": {
        "FunctionName": "Multiply",
        "Payload": {
          "input.$": "$.Payload"
        }
      },
      "Next": "Choice"
    },
    "Choice": {
      "Type": "Choice",
      "Choices": [
        {
          "Variable": "$.Payload.result",
          "NumericGreaterThanEquals": 20,
          "Next": "Subtract"
        }
      ],
      "Default": "Notify"
    },
    "Subtract": {
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke",
      "Parameters": {
        "FunctionName": "Subtract",
        "Payload": {
          "input.$": "$.Payload"
        }
      },
      "Next": "Add"
    },
    "Notify": {
      "Type": "Task",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "TopicArn": "arn:aws:sns:us-east-1:657860672583:CalculateNotify",
        "Message.$": "$$",
        "Subject": "Failed Test"
      },
      "End": true
    },
    "Add": {
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke",
      "Parameters": {
        "FunctionName": "Add",
        "Payload": {
          "input.$": "$.Payload"
        }
      },
      "Next": "Divide"
    },
    "Divide": {
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke",
      "Parameters": {
        "FunctionName": "Divide",
        "Payload": {
          "input.$": "$.Payload"
        }
      },
      "End": true
    }
  }
}

This will yield the following AWS Step Function state machine after the pipeline completes:

State machine

CodeBuild Spec files

The CI/CD pipeline uses a collection of CodeBuild BuildSpec files chained together through CodePipeline. The following sections demonstrate what these BuildSpec files look like and how they can be used to chain together and build a full CI/CD pipeline.

AWS States Language linter

In order to determine whether or not the State Machine Definition is valid, include a stage in your CodePipeline configuration to evaluate it. Through the use of a Ruby Gem called statelint, you can verify the validity of your state machine definition as follows:

lint_buildspec.yaml file:

version: 0.2
env:
  git-credential-helper: yes
phases:
  install:
    runtime-versions:
      ruby: 2.6
    commands:
      - yum -y install rubygems
      - gem install statelint

  build:
    commands:
      - statelint sm_def.json

If your configuration is valid, you do not see any output messages. If the configuration is invalid, you receive a message telling you that the definition is invalid and the pipeline terminates.

Lambda unit testing

In order to test your Lambda function code, you need to evaluate whether or not it passes a set of tests. You can test each individual Lambda function deployed and used inside of the state machine. You can feed various inputs into your Lambda functions and assert that the output is what you expect it to be. In this case, you use Python pytest to kick-off tests and validate results.

unit_test_buildspec.yaml file:

version: 0.2
env:
  git-credential-helper: yes
phases:
  install:
    runtime-versions:
      python: 3.8
    commands:
      - pip3 install -r tests/requirements.txt

  build:
    commands:
      - pytest -s -vvv tests/unit/ --junitxml=reports/unit.xml

reports:
  StateMachineUnitTestReports:
    files:
      - "**/*"
    base-directory: "reports"

Take note that in the CodeCommit repository includes a directory called tests/unit, which includes a collection of unit tests that are run and validated against your Lambda function code. Another very important part of this BuildSpec file is the reports section, which generates reports and metrics about the results, trends, and overall success of your tests.

CodeBuild test reports

After running the unit tests, you are able to see reports about the results of the run. Take note of the reports section of the BuildSpec file, along with the –junitxml=reports/unit.xml command run along with the pytest command. This generates a set of reports that can be visualized in CodeBuild.

Navigate to the specific CodeBuild project you want to examine and click on the specific execution of interest. There is a tab called Reports, as seen in the following screenshot:

Test reports

Select the specific report of interest to see a breakdown of the tests that have run, as shown in the following screenshot:

Test visualization

With Report Groups, you can also view an aggregated list of tests that have run over time. This report includes various features such as the number of average test cases that have run, average duration, and the overall pass rate, as shown in the following screenshot:

Report groups

The AWS CloudFormation template step

The following BuildSpec file is used to generate an AWS CloudFormation template that inject the State Machine Definition into AWS CloudFormation.

template_sm_buildspec.yaml file:

version: 0.2
env:
  git-credential-helper: yes
phases:
  install:
    runtime-versions:
      python: 3.8

  build:
    commands:
      - python template_statemachine_cf.py

The Python script that templates AWS CloudFormation to deploy the State Machine Definition given the sm_def.json file in your repository follows:

template_statemachine_cf.py file:

import sys
import json

def read_sm_def (
    sm_def_file: str
) -> dict:
    """
    Reads state machine definition from a file and returns it as a dictionary.

    Parameters:
        sm_def_file (str) = the name of the state machine definition file.

    Returns:
        sm_def_dict (dict) = the state machine definition as a dictionary.
    """

    try:
        with open(f"{sm_def_file}", "r") as f:
            return f.read()
    except IOError as e:
        print("Path does not exist!")
        print(e)
        sys.exit(1)

def template_state_machine(
    sm_def: dict
) -> dict:
    """
    Templates out the CloudFormation for creating a state machine.

    Parameters:
        sm_def (dict) = a dictionary definition of the aws states language state machine.

    Returns:
        templated_cf (dict) = a dictionary definition of the state machine.
    """
    
    templated_cf = {
        "AWSTemplateFormatVersion": "2010-09-09",
        "Description": "Creates the Step Function State Machine and associated IAM roles and policies",
        "Parameters": {
            "StateMachineName": {
                "Description": "The name of the State Machine",
                "Type": "String"
            }
        },
        "Resources": {
            "StateMachineLambdaRole": {
                "Type": "AWS::IAM::Role",
                "Properties": {
                    "AssumeRolePolicyDocument": {
                        "Version": "2012-10-17",
                        "Statement": [
                            {
                                "Effect": "Allow",
                                "Principal": {
                                    "Service": "states.amazonaws.com"
                                },
                                "Action": "sts:AssumeRole"
                            }
                        ]
                    },
                    "Policies": [
                        {
                            "PolicyName": {
                                "Fn::Sub": "States-Lambda-Execution-${AWS::StackName}-Policy"
                            },
                            "PolicyDocument": {
                                "Version": "2012-10-17",
                                "Statement": [
                                    {
                                        "Effect": "Allow",
                                        "Action": [
                                            "logs:CreateLogStream",
                                            "logs:CreateLogGroup",
                                            "logs:PutLogEvents",
                                            "sns:*"             
                                        ],
                                        "Resource": "*"
                                    },
                                    {
                                        "Effect": "Allow",
                                        "Action": [
                                            "lambda:InvokeFunction"
                                        ],
                                        "Resource": "*"
                                    }
                                ]
                            }
                        }
                    ]
                }
            },
            "StateMachine": {
                "Type": "AWS::StepFunctions::StateMachine",
                "Properties": {
                    "DefinitionString": sm_def,
                    "RoleArn": {
                        "Fn::GetAtt": [
                            "StateMachineLambdaRole",
                            "Arn"
                        ]
                    },
                    "StateMachineName": {
                        "Ref": "StateMachineName"
                    }
                }
            }
        }
    }

    return templated_cf


sm_def_dict = read_sm_def(
    sm_def_file='sm_def.json'
)

print(sm_def_dict)

cfm_sm_def = template_state_machine(
    sm_def=sm_def_dict
)

with open("sm_cfm.json", "w") as f:
    f.write(json.dumps(cfm_sm_def))

Deploying the test pipeline

In order to verify the full functionality of an entire state machine, you should stand it up so that it can be tested appropriately. This is an exact replica of what you will deploy to Production: a completely separate stack from the actual production stack that is deployed after passing appropriate end-to-end tests and approvals. You can take advantage of the AWS CloudFormation target supported by CodePipeline. Please take note of the configuration in the following screenshot, which shows how to configure this step in the AWS console:

Deploy test pipeline

End-to-end testing

In order to validate that the entire state machine works and executes without issues given any specific changes, feed it some sample inputs and make assertions on specific output values. If the specific assertions pass and you get the output that you expect to receive, you can proceed to the manual approval phase.

e2e_tests_buildspec.yaml file:

version: 0.2
env:
  git-credential-helper: yes
phases:
  install:
    runtime-versions:
      python: 3.8
    commands:
      - pip3 install -r tests/requirements.txt

  build:
    commands:
      - pytest -s -vvv tests/e2e/ --junitxml=reports/e2e.xml

reports:
  StateMachineReports:
    files:
      - "**/*"
    base-directory: "reports"

Manual approval (SNS topic notification)

In order to proceed forward in the CI/CD pipeline, there should be a formal approval phase before moving forward with a deployment to Production. Using the Manual Approval stage in AWS CodePipeline, you can configure the pipeline to halt and send a message to an Amazon SNS topic before moving on further. The SNS topic can have a variety of subscribers, but in this case, subscribe an approver email address to the topic so that they can be notified whenever an approval is requested. Once the approver approves the pipeline to move to Production, the pipeline will proceed with deploying the production version of the Step Function state machine.

This Manual Approval stage can be configured in the AWS console using a configuration similar to the following:

Manual approval

Deploying to Production

After the linting, unit testing, end-to-end testing, and the Manual Approval phases have passed, you can move on to deploying the Step Function state machine to Production. This phase is similar to the Deploy Test Stage phase, except the name of your AWS CloudFormation stack is different. In this case, you also take advantage of the AWS CloudFormation target for CodeDeploy:

Deploy to production

After this stage completes successfully, your pipeline execution is complete.

Cleanup

After validating that the test state machine and Lambda functions work, include a CloudFormation step that will tear-down the existing test infrastructure (as it is no longer needed). This can be configured as a new CodePipeline step similar to the below configuration:

CloudFormation Template for cleaning up resources

Conclusion

You have linted and validated your AWS States Language definition, unit tested your Lambda function code, deployed a test AWS state machine, run end-to-end tests, received Manual Approval to deploy to Production, and deployed to Production. This gives you and your team confidence that any changes made to your state machine and surrounding Lambda function code perform correctly in Production.

About the Author

Matt Noyce is a Cloud Application Architect in Professional Services at Amazon Web Services.
He works with customers to architect, design, automate, and build solutions on AWS
for their business needs.

AWS DevOps & Developer Productivity Blog

Testing and creating CI/CD pipelines for AWS Step Functions

CI/CD pipeline steps

Prerequisites

The CodePipeline project

Creating a CodeCommit repository

Breakdown of repository structure

Defining the state machine

CodeBuild Spec files

AWS States Language linter

Lambda unit testing

CodeBuild test reports

The AWS CloudFormation template step

Deploying the test pipeline

End-to-end testing

Manual approval (SNS topic notification)

Deploying to Production

Cleanup

Conclusion

About the Author

Resources

Follow