Containers
Ensuring fair bandwidth allocation for Amazon EKS Workloads
Independent Service Vendor (ISV) users often offer their end-user solutions hosted on a multi-tenant architecture to reduce cost and operational management. However, this approach can lead Kubernetes clusters to resource exhaustion or network starvation issues that impact neighboring workloads. By default, Kubernetes provides capabilities to enforce resource availability such as CPU and memory to prevent compute starvation. However, workloads are rapidly evolving to use other resources such as network bandwidth to improve performance. For example, a pod can choose to download terabytes of traffic at huge rate to improve response time which leads to bandwidth exhaustion and effects neighboring pods.
In this post, we look into how to solve this Kubernetes challenge with the Amazon Virtual Private Cloud (Amazon VPC) CNI plugin. We demonstrate how the Amazon VPC CNI plugin can be used to restrict pods usage on network for the ingress and egress bandwidth, preventing network starvation and making sure of network stability and QoS.
What is Amazon VPC CNI?
Although there are multiple CNI plugins available, Amazon VPC CNI plugin, developed by AWS for Amazon Elastic Kubernetes Service (EKS), allows container networking to natively use Amazon VPC networking and security features. The CNI plugin manages Elastic Network Interfaces (ENI) on a node, both Amazon Elastic Compute Cloud (Amazon EC2) and AWS Fargate.
When you provision a node, the plugin automatically allocates a pool of slots (IPs or Prefixes) from the node’s subnet of the primary ENI. It enables Kubernetes pod networking and connectivity for applications deployed on Amazon EKS along with integrating Amazon VPC networking functionality directly into Kubernetes pods. For example, pods are assigned with their own private IP addresses from the VPC and security groups can be applied to pods directly.
When enabling bandwidth limit capability, the Amazon VPC CNI plugin relies on bandwidth plugin to control ingress and egress bandwidth limits for individual containers or pods using Linux traffic control utilities such as `tc` (Traffic Control).
Walkthrough
Let’s look at how you can use the CNI plugin to enable ingress and egress traffic shaping.
Prerequisites
For this walkthrough, you should have the following prerequisites:
- Amazon EKS cluster v1.24 and above
- Amazon VPC CNI v1.15.0 and above
- kubectl24 and above
- eksctl175.0 and above
Step 0: (Optional) Create an EKS cluster using eksctl
This configuration is used with `eksctl` to provision an EKS cluster v1.28. You can skip this step if you have your own EKS cluster already.
Note that if you provision this EKS cluster v1.28, then you should make sure that your `kubectl` is also v1.28 or within one minor version difference of your cluster. For example, a v1.28 client can communicate with v1.27, v1.28, and v1.29 control planes.
Step 1: Enable CNI bandwidth plugin of an EC2 instance
Before setting bandwidth limits of pods, you need to enable the bandwidth capability for the CNI plugin by accessing Amazon EC2. We recommend using AWS System Manager Session Manager to connect to Amazon EC2. Session Manager provides secure and auditable node management without the need to open inbound ports, maintain bastion hosts, or manage Secure Shell (SSH) keys. Therefore, you can tighten security and reduce attack surface.
When you provision an EKS cluster with the above `eksctl`, it uses Amazon EKS optimized Amazon Linux Amazon Machine Images (AMIs) by default. With this AMI, an EC2 instance is configured with the necessary requirements so you can connect to the instance using Session Manager without further setup.
To connect to a Linux instance using Session Manager with the Amazon EC2 console
- Open the Amazon EC2 console.
- In the navigation pane, choose Instances.
- Select the instance and choose Connect.
- Choose Session Manager.
- Choose Connect.
For more information and instructions on how to set up Session Manager, see Setting up Session Manager. After you’re connected to the instance, run the following commands:
The following JSON object is added under the `plugins` key.
Step 2: Install iperf and tc CLI in your EC2 instances
In this step, we install the necessary CLI tools that are used to check and test the bandwidth limitation, namely `iperf` and `tc`.
- `iperf` is a widely used command-line tool for measuring network performance. It can be used to measure the bandwidth between two endpoints, such as between a client and a server or between two servers.
- `tc` (Traffic control) is a user-space utility command in Linux that allows you to configure and manage the traffic control settings of network interfaces. It provides a powerful set of tools for shaping, scheduling, policing, and prioritizing network traffic.
Amazon Linux 2023 does not support Extra Packages for Enterprise Linux (EPEL), and `iperf` is not available to install through `yum`. However, you can download and install iperf for AL2023 manually by following the instruction in this AWS Knowledge Center post.
Step 3: Deploy pods without bandwidth restriction
We will first deploy a standard application, namely `nginx`, on our EKS cluster. Bandwidth restrictions are not configured for now to facilitate later comparison between the two setups.
Create a new file called `nginx-deployment.yaml with the following definition:
Run this command to deploy:
Run this command to check an IP of a pod and a node in which the pod is residing:
Step 4: Test pod on egress/ingress limits
After we deploy the application without specifying bandwidth limits, we use the `tc` command to check the current `qdisc` in the EC2 instance.
- `qdisc` (Queuing discipline) is an algorithm that manages the way packets are queued and scheduled for transmission on a network interface. It determines the order in which packets are sent out from the kernel’s packet queues to the network interface card.
Run this command to check `qdisc`:
Output:
The `qdisc pfifo_fast` uses a simple First-In, First-Out (FIFO) queue and doesn’t perform traffic shaping or prioritization.
Next, we use `iperf` to perform the bandwidth measurement. Run this command to measure the maximum achievable bandwidth:
Replace {POD_IP} with the IP that you get from Step 3.
Output:
Step 5: Re-deploy the pods with bandwidth restriction
After testing the bandwidth of the pod without egress and ingress limits, we add the following annotations to the deployment to specify the bandwidth limits of egress and ingress:
- `kubernetes.io/ingress-bandwidth` – To control ingress bandwidth
- `kubernetes.io/egress-bandwidth` – To control egress bandwidth
Update the manifest in `nginx-deployment.yaml` and re-deploy with the same command:
Re-deploy the application again:
Step 6: Test pod on egress/ingress limits
After we re-deploy the updated manifest with ingress and egress bandwidth limits, we repeat the same procedure with Step 4 to make sure our new configuration is now effective.
Run this command to check `qdisc`:
Output:
As you can see from the output, `qdisc` comes up with `tbf` (Token Bucket Filter), which is a classful queueing discipline available for traffic control.
Run this command to measure the maximum achievable bandwidth:
Output:
Before and After:
The following visualization shows the bandwidth in Gbits/sec. The orange line represents “Before” we added the bandwidth annotation to the deployment. The blue line represents “After” we set the bandwidth annotation.
Cleaning up
Delete the deployment:
To delete your EKS cluster provisioned in Step 0:
Consideration
The bandwidth plugin is not compatible with Amazon VPC CNI based Network policy at the time of writing this post. Network Policy agent uses the Traffic Classifier (TC) system to enforce configured network policies for the pods. The policy enforcement fails if the bandwidth plugin is enabled due to conflict between the TC configuration of the bandwidth plugin and the Network policy agent. We’re exploring options to support the bandwidth plugin along with the Network policy feature, and the issue is tracked through this AWS GitHub issue.
Conclusion
In this post, we showed you how we can use the Amazon VPC CNI plugin and its capabilities to limit ingress and egress bandwidth for applications running as pods in Amazon EKS. By using this, users can implement a functionality to restrict their pods usage on the network and prevent network starvation due to huge network consumption from neighbor pods in a Kubernetes cluster.