Automate Amazon EKS upgrades with infrastructure as code

In this post, we explain how to use managed node groups to upgrade Amazon Elastic Kubernetes Service (Amazon EKS) cluster nodes in parallel from 1.19 to 1.20. Users can use the AWS Service Catalog to support an automated workflow with granular controls. This capability provides the option to upgrade the control plane and nodes. A list of Amazon EKS features, based on Kubernetes version, can be found in the open source publicly available change log document.

The solution

Multiple components, configurations, and applications run in Kubernetes clusters. Operations and developers want to be able to use new features that help optimize the environment. Upgrading a Kubernetes cluster can be a manual and time-consuming effort if developers must perform common validation and procedural steps.

You can upgrade clusters through the AWS Management Console or through the command line interface with eksctl or kubectl tools. Upon Amazon EKS cluster enablement, there is no auditing in AWS CloudTrail. Both options are manual and are not logged for tracking and auditing.

The solution described in this article automates the Amazon EKS upgrade process using managed node groups, a launch template, infrastructure as code, and container images with Python code. The three container images issue eksctl and kubectl commands along with status checks. The cluster upgrade is encapsulated into simple parameters. In the background, the commands and configuration are captured in logs.

The AWS Lambda container image types are:

Control plane upgrade: Upgrades the Amazon EKS control plane to the specified version.
DaemonSet upgrade image: Upgrades the aws-node, coredns, and kube-proxy DaemonSets to match the control plane version.
Node group upgrade image: Upgrades the node group launch template version to match the Amazon EKS cluster version.

This diagram depicts the Amazon EKS upgrade process on the control and data planes.

The control plane workflow is shown in the preceding image. Shown in the black-numbered circles in the image:

An AWS Lambda container image is created using AWS CloudFormation that sends a control plane upgrade request and stores update ID.
The AWS Lambda container request is received and starts a control plane upgrade.
The AWS Lambda container takes the update ID and checks the status of the control plane upgrade progress.
If the control plane upgrade is successful, an AWS CloudFormation template deploys a managed node group with Kubernetes version 1.20. If the Amazon EKS cluster is unhealthy, the control upgrade will fail.

The data plane workflow shown in the yellow-numbered circles:

The control plane upgrade was successful and an AWS CloudFormation template deployed a managed node group with Kubernetes version 1.20.
The AWS Lambda container request is received and starts a data plane upgrade on the different types of DaemonSets one at a time.
If all upgrades on the data plane are successful, an AWS CloudFormation template deploys a managed node group with Kubernetes version 1.20. If the Amazon EKS cluster is in an unhealthy status, the control upgrade fails.
The AWS Lambda container request is received and starts a data plane upgrade on the managed node groups.

Prerequisites

To follow the steps to provision the pipeline deployment, you must have the following:

An AWS account with local credentials properly configured (typically under ~/.aws/credentials).
The latest version of the AWS CLI. For more information, refer to the documentation for installing, updating, and uninstalling the AWS CLI.
An eksctl client to interact with the cluster’s API.
A Docker client to build container images.
A git client to clone the source code provided.

Note: View the code within the GitHub repository. AWS best practices recommend reducing the AWS Identity and Access Management (IAM) policy to meet your company’s requirements. These permissions are for demonstration only and are not production ready.

Walkthrough

Clone the source code repository found in the following location:

git clone https://github.com/aws-samples/automation-eks-upgrades

Create an Amazon Elastic Kubernetes cluster. The following command creates a new Amazon EKS cluster, but you can modify the steps for your use case if you have an existing cluster:

eksctl create cluster \
--name demo-eks \
--version 1.19 \
--nodegroup-name demo-managed-node-group \
--node-type t3.medium \
--nodes 2 \
--region <AWS_REGION> \
--enable-ssm

The output should look like the following:

2050-08-12 00:00:00 [✔]  EKS cluster "demo-eks-cluster" in "us-east-1" region is ready

Authenticate to the Amazon Elastic Container Registry (Amazon ECR) repository where the Amazon EKS cluster is running:

aws ecr get-login-password \
--region <AWS_REGION> | docker login \
--username AWS \
--password-stdin <AWS_ACCOUNT_ID>.dkr.ecr.<AWS_REGION>.amazonaws.com

The output should look like the following:

Login Succeeded

Build and push container images to the Amazon ECR repository with a bash script:

images/ecr_push_image.sh <AWS_ACCOUNT_ID> <AWS_REGION>

Note: The ecr_push_image.sh script can generate an error due to Docker-related limits.

Deploying the CloudFormation templates

To deploy your templates, complete the following steps.

Deploy the IAM role configuration template:

aws cloudformation create-stack \
--stack-name IAM-Stack \
--template-body file://templates/iam.yml \
--capabilities CAPABILITY_NAMED_IAM
--region <AWS_REGION>

The output should look like the following:

{
    "StackId": "arn:aws:cloudformation:us-west-2:0123456789012:stack/IAM-Stack/e7aac640-fba0-11eb-bf85-XXXXXX"
}

Set local environment variables IAM identity mapping:

export CLUSTER_NAME=<CLUSTER_NAME>
export LAMBDA_EXECUTION_ROLE=$(aws cloudformation describe-stacks --stack-name Demo-IAM-Stack --query Stacks[*].Outputs[0].OutputValue --output text --region <AWS_REGION>)

Check that the local environment variables were properly exported:

printenv | awk '/CLUSTER_NAME/||/LAMBDA_EXECUTION_ROLE/{print $0}'

The output should look like the following:

CLUSTER_NAME=demo-eks-cluster
LAMBDA_EXECUTION_ROLE=arn:aws:iam::0123456789012:role/lambda-execution-role

Update IAM identity mapping to associate the AWS Lambda execution role to the Amazon EKS cluster RBAC:

eksctl create iamidentitymapping \
--cluster $CLUSTER_NAME \
--arn $LAMBDA_EXECUTION_ROLE \
--group system:masters \
--username admin \
--region <AWS_REGION>

The output should look like the following:

2050-08-13 12:00:00 [ℹ] eksctl version 0.55.0
2050-08-13 12:00:00 [ℹ] using region us-west-2
2050-08-13 12:00:00 [ℹ] adding identity "arn:aws:iam::0123456789012:role/lambda-execution-role" to auth ConfigMap

Open the parameters/controlplane-cluster.json file and provide the EksClusterName, EksUpdateClusterVersion, and IAMStackName.
Deploy the configuration template to start the cluster control plane upgrade. This step can take up to 60 minutes to complete:

--stack-name demo-eks-cluster-upgrade \
--template-body file://templates/controlplane-cluster.yml \
--parameters file://parameters/controlplane-cluster.json \
--capabilities CAPABILITY_AUTO_EXPAND \
--region <AWS_REGION>

Check the present version of the cluster:

aws eks describe-cluster \
--name demo-eks-cluster \
--region <REGION> \
--query cluster.version \
--output text

The output should look like the following:

1.20

Open the parameters/dataplane-daemonset.json file and provide the EksClusterName and IAMStackName.
Deploy the configuration template to start the cluster data plane DaemonSet upgrade:

aws cloudformation create-stack \
--stack-name demo-eks-daemonset-upgrade \
--template-body file://templates/dataplane-daemonset.yml \
--parameters file://parameters/dataplane-daemonset.json \
--capabilities CAPABILITY_AUTO_EXPAND \
--region <AWS_REGION>

The output should look like the following:

{
    "StackId": "arn:aws:cloudformation:us-west-2:0123456789012:stack/aws-eks-daemonset-upgrade/59f93f30-fc67-11eb-8242-XXXXXX"
}

Check Amazon CloudWatch Logs for the following output for each DaemonSet type:

{
"Status": "SUCCESS",
"Reason": "See the details in CloudWatch Log Stream: 2050/08/13/[$LATEST]6755fff2fef147e6b94XXXXXXXX",
"PhysicalResourceId": "2050/08/13/[$LATEST]6755fff2fef147e6b94XXXXXXXX",
"StackId": "arn:aws:cloudformation:us-west-2:0123456789012:stack/aws-eks-daemonset-upgrade/fbb9d6e0-fc6c-11eb-a6f5-0aXXXXX",
"RequestId": "8a5ccd07-30a5-49d4-ba50-5XXXXXXX",
"LogicalResourceId": "AwsNodeDaemonsetUpgradeFunction",
"NoEcho": false,
"Data": null
}

Deploy the configuration template to start the cluster data plane node groups upgrade:

aws cloudformation create-stack \
--stack-name aws-eks-nodegroup-upgrade \
--template-body file://templates/dataplane-nodegroup.yml \
--parameters file://parameters/dataplane-nodegroup.json \
--capabilities CAPABILITY_AUTO_EXPAND \
--region <AWS_REGION>

The output should look like the following:

{
    "StackId": "arn:aws:cloudformation:us-west-2:0123456789012:stack/aws-eks-nodegroup-upgrade/df59be10-01bd-11ec-8a4e-XXXXXX"
}

Clean up

Delete the node group stack:

aws cloudformation delete-stack \
--stack-name aws-eks-nodegroup-upgrade

Delete the DaemonSet stack:

aws cloudformation delete-stack \
--stack-name aws-eks-daemonset-upgrade

Delete the cluster stack:

aws cloudformation delete-stack \
--stack-name aws-eks-cluster-upgrade

Delete the IAM stack:

aws cloudformation delete-stack \
--stack-name aws-iam-stack

Delete container images and repository:

images=("controlplane/upgrade" "controlplane/status" "dataplane/daemonset" "dataplane/nodegroup" "dataplane/status")

for app in ${images[*]}; do aws ecr delete-repository --repository-name $app --force || true ; done

Conclusion

We have explained how to to upgrade a managed Kubernetes cluster using Amazon EKS in a repeatable pattern with configurations files, templates, and code. The activities during the upgrade process are logged in Amazon CloudWatch Logs. That information can be used for monitoring, alerting, and audits. The time that you would spend deploying these changes can be spent on other priorities, instead.

AWS Open Source Blog

Automate Amazon EKS upgrades with infrastructure as code

The solution

Prerequisites

Walkthrough

Deploying the CloudFormation templates

Clean up

Conclusion

Resources

Follow