Containers
Deploying Karpenter Nodes with Multus on Amazon EKS
Container based Telco workloads use Multus CNI primarily for traffic or network segmentation. Amazon Elastic Kubernetes Service (Amazon EKS) supports Multus CNI enabling users to attach multiple network interfaces, apply advanced network configuration and segmentation to Kubernetes-based applications running on AWS. One of the many benefits of running applications on AWS is resource elasticity (scaling out and scaling in). Node elasticity can be made possible by a cluster autoscaler such as Karpenter. Karpenter automatically launches the right compute resources to handle application demand. It is designed to use the cloud with fast and simple compute provisioning for Kubernetes clusters. In addition, nodes provisioned by Karpenter are group-less and provisioned outside of an Amazon EKS node group.
Purpose
This post demonstrates a deployment model of an EKS cluster with Karpenter provisioned nodepools with Multus interfaces. The deployment model is recommended for – but not limited to – Telco workloads on AWS. It aims to leverage Karpenter as a group-less nodepool and as an autoscaling solution. The group-less workers (nodepools) are for application pods that need Multus CNI networking, while the Amazon EKS managed nodegroup workers are running pods that do not need Multus CNI, such as plugins, add-ons, and Karpenter itself. This post also illustrates how Karpenter manages just-in-time scaling of workers with multiple elastic network interfaces (ENI), thus meeting stringent requirements for scalability and network separation.
Why this type of deployment?
The main use case of Karpenter is to provision worker nodes in a Kubernetes cluster during deployment and scale out events. In this post we introduce an additional use case where Karpenter is used for provisioning nodepools hosting applications that use Multus CNI. This deployment model decouples your application worker nodes from the Amazon EKS-managed nodegroups. This approach provides benefits such as 1/ your application workers are not limited to a specific instance type and size, and you have a wide selection of instance types and family to choose from; 2/ your application scale-out capabilities are not tied to an Amazon Elastic Compute Cloud (Amazon EC2) autoscaling group, providing the benefit to scale with more flexibility than with an Amazon EKS nodegroup autoscaling group. Furthermore, by not relying on EC2 Autoscaling group, Karpenter is quick to provision worker nodes by directly using Amazon EC2 fleet APIs, which can be critical for application scale-out scenarios. Standard Karpenter based node provisioning does not support Multus CNI, as it creates nodes attached to a single VPC subnet. This post provides the solution for Multus via Karpenter by introducing user-data based ENI management via EC2NodeClass.
Deployment
The steps detailed in this section use the GitHub repository.
This deployment showcases the flexibility of Karpenter as a means to run application pods with Multus CNI. It applies the approach of creating and attaching ENIs for Multus through Karpenter and at the same time demonstrates the scaling capabilities of Karpenter.
Prerequisites
The following prerequisites are necessary to continue with this post:
Environment setup
In this setup we create a VPC, EKS clusters, EKS Managed Node Group, Security Groups, and associated VPC components.
- We use the CloudShell environment from here to configure the EKS cluster and deploy a sample application. Go to your CloudShell console, download this GitHub repository, and start by executing the following script to install the necessary tools (awscli, kubectl, eksctl, helm, etc.).
2. Create an AWS CloudFormation stack using template vpc-infra-mng.yaml. Select two Availability zones
(e.g., us-west-2a & us-west-2b). Name your stack karpenterwithmultus (you need this stack name later in your CloudShell environment). You can keep the other parameters default. You can use the following AWS Command Line Interface (AWS CLI) command or use the AWS Management Console using the CloudFormation menu.
Before executing the next steps make sure the first CFN stack creation is completed. Note that the stack creation may take ~ 15 minutes as it builds the EKS cluster and worker nodes.
The resulting architecture looks like the following image:
- On your CloudShell, create environment variables to define the Karpenter version, EKS cluster name, and AWS Region. Make sure you change the parameter values according to your environment. Then, go to the CloudShell console and execute the following commands:
NOTE:
VPC_STACK_NAME is the name you gave for your CloudFormation template vpc-infra-mng.yaml.
Pay special attention to the default Region value. In later steps your nodepool.yaml config is configured with Availability Zones (AZs).
This example uses CloudShell as your admin node. Environment variables are lost when CloudShell times out. Feel free to use your own admin node.
Update your kubeconfig and test Amazon EKS control plane access.
If you don’t get an error, then it means you have access to the K8S cluster.
Plugin setup
4. Install the Multus CNI Multus CNI is a container network interface plugin for Kubernetes that enables you to attach multiple network interfaces to pods. If you want to understand Multus CNI and VPC CNI and how they work together on Amazon EKS, then refer to this Amazon Container post.
5. We need to have CIDR reservations on the Multus subnets since we dedicate a portion of the IPs for Multus pod IPs. CIDR reservation with the explicit flag tells the VPC not to touch these CIDR blocks when creating VPC resources such as ENI. Execute the following CIDR reservation commands on the Multus subnets. We are going to reserve /27.
NOTE: If you are getting an error – subnet ID doesn’t exist – then check if you have defined the correct AWS_DEFAULT_REGION environment variable.
6. Install the Whereabouts plugin, Whereabouts is an IPAM CNI plugin that we use to managed the Multus pod IP addresses that we set aside from the previous step. Multus pod IPs through whereabouts are defined via Custom Resource Definition (CRD) Network-Attachment-Definitions (NAD).
7. Apply NetworkAttachmentDefinitions on the cluster. This configures the multus interfaces on the pods when we create the application pods later. If you inspect the file, then you notice that the range we defined is the CIDR reservation prefixes we set aside in the previous step.
IAM role setup
8.Create the Karpenter IAM role and other prerequisites needed for Karpenter. You should see “Successfully created/updated stack – <Karpenter-${CLUSTER_NAME}>” at the end of this step.
9. Create an AWS Identity and Access Management (IAM) policy for additional actions needed for Multus, and attach it to the Karpenter Node Role. The userdata section of EC2NodeClass contains a script that creates, attaches, and configures ENIs to Karpenter provisioned nodes. The nodes need the right policies to be attached to the Karpenter node role
10.(Optional) Run the following command to create a role to allow the use of spot instances. If the role has already been successfully created, then you see the following: # An error occurred (InvalidInput) when calling the CreateServiceLinkedRole operation: Service role name AWSServiceRoleForEC2Spot has been taken in this account, please try a different suffix.
aws iam create-service-linked-role --aws-service-name spot.amazonaws.com || true
NOTE: This step is optional and needed only if you want to use spot instances on your Karpenter nodepool.
Installation of Karpenter
11.Install Karpenter.
12. Execute the following command to check if the Karpenter pods are in the Running state.
13.Update the nodepool.yaml files with the Multus subnet tag name, AZ, and security group tag names. For further reading, more details of Karpenter nodepool configuration can be found in this Karpenter concepts post.
Change the AZs in the nodepool.yaml, and run the following command using the correct AZ names that you used on your deployment. In this example we are using us-west-2a and us-west-2b.
Update the EKS cluster name using the following commands:
Apply the Karpenter nodepool configuration.
NOTE: If you inspect the config/nodepool.yaml file, then you notice a customized userdata section of EC2NodeClass. The script inside the userdata provisions, attaches, and configures the MULTUS ENIs during Amazon EC2 nodepool creation. This is where we are configuring the MULTUS ENI LCM on the nodepools. If you need additional node tuning, then you can also do so in the userdata section.
You can check Karpenter controller if your nodepool config has errors.
14. To address a race condition that occurs between Multus daemonset pods and application pods both being scheduled at the same time on nodepool nodes, Karpenter needs to be configured with StartupTaints for the nodepool. The StartupTaint prevents the application pods from being scheduled on the new nodes until Multus is ready and the taint is removed. To automate the removal of taint on the nodes, a DaemonSet based solution is used here.
First let’s create a namespace for the daemonset that clears the taint.
We have to provide RBAC controls on the daemonset-clear-taints so that it can clear the taint on the node. A service account limited to the namespace cleartaints, role and rolebinding between the service account for cleartaints namespace, and the role it is permitted to use, is created with the following command.
Create and apply the daemonset called daemonset-clear-taints under the namespace called cleartaints.
Check the daemonset pods. At this point you won’t be seeing a running daemonset since cleartaints only runs on the Karpenter nodepools.
Deployment of Node-Latency-For-K8s
15. (Optional) To collect data about the scale-out speed of our nodepools, we can deploy Node-Latency-For-K8s solution. Node-Latency-For-K8s is an opensource tool used to analyze the node launch latency. The tool measures the different phases a node goes through during instantiation up to running the application pod. Included are the containered start/finish time, kubelet timing, and Node Readiness time. In the later steps of this post, you retrieve some examples.
Deployment of a sample application
16. Deploy a sample application. The following commands would install a deployment with one app replica per AZ. This also triggers Karpenter to create nodepools to be able to schedule the application.
Update the AZ with the correct AZ name on each file following multitool-deployment-az1.yaml, multitool-deployment-az2.yaml
Each of the deployments has the affinity key “karpenter-node”, thus these application pods are scheduled only on the worker nodes with the label “karpenter-node”. As the Karpenter nodepool configuration assigns a label when the deployment is created, Karpenter scales a node to schedule/run the pods of these deployments.
Watch for Karpenter scaling of a new Amazon EKS worker node using the following command:
Check Karpenter Logs.
Once Karpenter launches new EC2 instances and joins the EKS cluster in the Ready state, Pods that were in the Pending state go to the Running state.
The resulting architecture now look like the following image with additional Karpenter workers with Multus interfaces.
17.Validate that the application pods are in the Running state using the following command.
You can inspect the pods using kubectl describe pod or kubectl exec to verify the Multus interfaces and addresses.
Choose one of the pods and do the following exec command to see the Multus pod IPs. You should see net1 and net2 interfaces as your Multus interfaces. On the Amazon EC2 worker the Multus ENIs belongs to the same subnet as eth1 and eth2 respectively.
NOTE: Observe the Multus pod IPs. These IPs belong to the range we defined in the NetworkAttachmentDefinitions file (Step 6).
(Optional) You can examine the time it took to provision the Karpenter nodepool by examining the Node-Latency-For-K8s log on the newly created nodes. Retrieve the logs of node-latency-for-k8s-node-latency-for-k8s-chart-xxxxx pods running on the nodepool worker. For example, logs might take five minutes to populate.
NOTE: The execution of the userdata script (approx. 20-25s) during bootup contributes to node ready time. Otherwise Karpenter node ready time would be around 30s.
(Optional) Automatic assignments of Multus Pod IPs
For Multus connectivity to the pods, it is essential to assign the pod Multus IPs as secondary IPs on their corresponding Multus ENIs on the worker node. You can automatically assign the Multus IP as secondary IPs on Multus ENIs by following this GitHub link and using either InitContainer based IP management Solution, or Sidecar based IP management Solution within the application pod.
Scaling action
18. Use the following command to perform the scale out to test the additional scaling of nodes using Karpenter, and monitor the nodes scaling out.
19. (Optional) Let’s collect one more data point on the Karpenter scale-out speed: retrieve the logs of node-latency-for-k8s-node-latency-for-k8s-chart-xxxxx pods running on the newly created nodepool workers.
20. Karpenter is flexible in the sense that it automatically provisions the needed node/s depending on the number of pending pods. You can do a scale in and scale out again but this time increase the number of replicas. Observe the type and size of instance that Karpenter provisioned.
Scale in:
Wait for Karpenter to terminate the existing nodepool, once terminated, scale out again. In this example you can only scale your pods up to the limit of the number of IP addresses available on your Network address definition.
Cleaning up
Delete all CloudFormation stacks in the reverse order.
Conclusion
In this post, we showed how Karpenter can be used in conjunction with Multus CNI and how it manages the lifecycle management of the ENIs used by Multus. The workers in the Karpenter provisioned nodepool have Multus-capable interfaces. It also demonstrated the benefits of using Karpenter as an autoscaling solution. Karpenter improves the efficiency and cost of running workloads on Kubernetes clusters by:
1/ Watching for pods that the Kubernetes scheduler has marked as unscheduled;
2/ Evaluating scheduling constraints (resource requests, nodeselectors, affinities, toleration, and topology spread constraints) requested by the pods;
3/ Provisioning nodes that meet the requirements of the pods; and
4/ Removing the nodes when the nodes are no longer needed. It’s a flexible tool to address the scaling requirements of your application.
You can read more about Karpenter and best practices in this AWS GitHub link.