Containers
Enhancing Kubernetes workload isolation and security using Kata Containers
Containers have become the dominant method for deploying and managing applications in recent years. Their widespread adoption is attributed to numerous advantages, such as isolation, efficient hardware use, scalability, and portability. In situations where resource isolation is critical for system security, many users are forced to rely on virtual machines (VMs) to mitigate the impact of a compromised container on the host or other containers sharing the host.
In a recent user engagement, we encountered a use case where the team needed to guarantee the tamper-proof nature of their containers. Specifically, they needed to compile their code and cryptographically sign it using a highly secure key. It was imperative to prevent unauthorized access to this key during the build process, making sure that other containers running on the same node could not compromise or extract it. This stringent security requirement prevented them from using containers to perform the build tasks in Kubernetes.
Kata Containers
Kata Containers is an open-source project that provides a secure container runtime that combines the lightweight nature of containers with the security benefits of VMs. It offers stronger workload isolation using hardware virtualization technology as a second layer of defense. In Kata Containers each container is effectively booted with a different guest operating system, as opposed to traditional containers where the Linux Kernel is shared among the workloads and container isolation is achieved by using namespaces and control groups (cgroups). Although traditional containers are a good fit for many workloads, they fall short when stronger isolation and security is needed.
Kata Containers run containers in a stripped-down OCI-compliant VM to provide strict isolation between containers sharing a host machine.
Kata Containers supports major architectures such as AMD64 and ARM. It also includes support for multiple hypervisors such as Cloud-Hypervisor and Firecracker – an AWS-built hypervisor that is used by AWS Lambda and integrates with the containerd project, among others.
Kata Containers abstract away the complexity of orchestrating workloads by using the Kubernetes orchestration system to provide a well-known interface to end users, while providing a custom runtime to run specific hypervisor software that use the Linux Kernel-based Virtual Machine (KVM) to provide strong workload isolation and security.
Kata Containers allows you to run containers integrating with industry standard tools such as OCI container format and Kubernetes CRI interface. It deploys your containers using a hypervisor of choice, which creates a VM to host the Kata Containers agent (kata-agent) and your workload inside the container environment. Each VM hosts a single kata-agent that acts as the supervisor for managing the containers and the workload running within those containers. The VMs have a separate guest Kernel that is highly optimized for boot time and minimal memory footprint, providing only those services required by a container workload, which is based on the latest Linux Long Term Support (LTS) kernel version. You can find detailed information about the Kata Containers architecture in their documentation pages.
By adopting Kata Containers, users can orchestrate their build jobs using Kubernetes with minimal configuration changes to their Continuous Integration (CI) System. The implementation made sure that their Pods were isolated, providing robust protection against container breakouts while maintaining the agility and efficiency of containerized workloads.
Running Kata Containers on AWS
In the next section of this post we demonstrate how to setup and run Kata Containers on AWS using Amazon Elastic Kubernetes Service (EKS). Before starting, note that it’s advised to run the following instructions from a Bastion Host that is deployed in the same Amazon Virtual Private Cloud (VPC) as where your EKS cluster is located.
We use Amazon EKS to run a fully functional Kubernetes cluster that uses Amazon Elastic Compute Cloud (EC2) bare metal instances as worker nodes to allow KVMs to be spawned up. In fact, standard EC2 instances don’t allow nested virtualization, thus the need to use bare metal.
Prerequisites
The following prerequisites are required to continue with this post:
- An Amazon VPC where to run your EKS cluster
- A Bastion Host to use for remote access to your Amazon VPC
- An AWS Identity and Access Management (IAM) Role with minimum policies as described in this document to be associated with the Bastion Host
- note: review the policies described in the preceding document to grant permissions to create your cluster and node groups
- AWS Command Line Interface (AWS CLI) v2 – installation guide
- eksctl – installation guide
- kubectl – installation guide (v1.29.0)
Configure EKS cluster
Once the environment is ready and available to use, you would need to Secure Shell (SSH) into your Bastion Host to perform the following commands (we recommend using AWS Systems Manager to start a new session):
You must update this command to change the subnets or the AWS Region to which you’d want to deploy the cluster. For the sake of this post, we are using the Kubernetes version 1.29, and you can update that to match the version you’d like to deploy.
This command creates an AWS CloudFormation stack and deploys a new EKS cluster. It takes a few minutes before you have the control plane active and it can be used for additional configuration.
Configure EKS nodes
After the cluster is deployed correctly and has been created, we can proceed by adding a new node group that would contain the instances needed to run our workloads. For this exercise, we use an i3.metal
instance, but you can use any metal instance that fits your use-case.
Create the metal-node-group.yaml
with the following content:
Note that you must update the subnets to use the one in your VPC before creating this file. Then use the following command to create the node group:
eksctl create nodegroup -f metal-node-group.yaml
Similarly to the create cluster
command, the create nodegroup
command creates a new CloudFormation template that would deploy your node group with the desired capacity in a few minutes.
Deploy Kata Containers
Kata Deploy is the faster way to deploy Kata Containers in your Kubernetes cluster. Although it’s the suggested deployment method for most cases, note that it provides a Dockerfile, which contains the binaries and artifacts needed to run Kata Containers (including the hypervisors binary files). In case you need custom versions of your hypervisor of choice or guest Kernel image, we suggest following the developer guide to build your own binaries and base images.
From the Bastion Host, we can now run the deployment of the Kata Containers into our cluster using Kata Deploy:
Then wait so that the deployment completes correctly:
Run the following kubectl command to apply the Kata Runtime classes:
The runtime classes allow you to quickly create pods that would run using a specific hypervisor. These are preconfigured classes that come with annotations to allow the deployment of our pods in the cluster nodes that support the Kata Containers runtime. Kata Containers provide multiple runtime classes to support the hypervisors deployed by Kata Deploy.
The following is an example of Runtime Class defined for Firecracker:
The deployment process also automatically updates the containerd configuration to add the runtime classes provided by Kata configured to run with a custom runtime shim.
Configure Firecracker
Firecracker is an open source virtualization technology that is purpose-built for creating and managing secure, multi-tenant container and function-based services. Firecracker enables you to deploy workloads in lightweight VMs, called microVMs, which provide enhanced security and workload isolation over traditional VMs, while enabling the speed and resource efficiency of containers. Firecracker was developed at AWS to improve the user experience of services such as AWS Lambda.
Since Firecracker’s Virtual Machine Monitor (VMM) does not enable filesystem-level sharing between the microVM and the host, it’s required to configure a snapshotter that would create snapshots as filesystem images that can be exposed to Firecracker microVMs as devices. containerd uses the snapshotter for storing image and container data.
In this section we see how to configure a devmapper snapshotter for Firecracker. You need to log in to the node that has been provisioned in the previous steps (Systems Manager is the recommended method).
Verify that the devmapper plugin isn’t configured yet:
If the output of this command is
then you have to create your devmapper snapshotter. Copy the following content inside a create.sh
script file:
Make it executable and run the script:
Verify that it has been created successfully:
Next, update the containerd configuration in /etc/containerd/config.toml with your preferred editor to add the following section at the end of the file:
Also, you should update the kata-fc
runtime section to add the devmapper snapshotter to the configuration file:
Once you complete these two actions, restart the daemon using sudo systemctl restart containerd
.
Now you can verify that the plugin is running correctly:
The preceding script needs to be run only once, while setting up the devmapper snapshotter
for containerd
for the first time. Subsequently, make sure that on each reboot, the thin-pool is initialized from the same data directory. Here is a simple script (reload.sh
) that can be used for that purpose: