Containers
Run GenAI inference across environments with Amazon EKS Hybrid Nodes
This blog post was authored by Robert Northard, Principal Container Specialist SA, Eric Chapman, Senior Product Manager EKS, and Elamaran Shanmugam, Senior Specialist Partner SA.
Introduction
Amazon Elastic Kubernetes Service (Amazon EKS) Hybrid Nodes transform how you run generative AI inference workloads across cloud and on-premises environments. Extending your EKS cluster to on-premises infrastructure allows you to deploy AI applications with consistent management and reduced operational complexity. Amazon EKS provides a managed Kubernetes control plane, and EKS Hybrid Nodes enables you to join on-premises infrastructure to the Amazon EKS control plane as worker nodes, eliminating the need to manage the Kubernetes control plane for on-premises deployments. EKS Hybrid Nodes also allows you to run cloud and on-premises capacity together in a single EKS cluster.
EKS Hybrid Nodes enable various AI/machine learning (ML) use cases and architectures, such as the following:
- Run services closer to users to support latency-sensitive workloads, including real-time inference at the edge.
- Train models with data that must stay on-premises due to data residency requirements.
- Run inference workloads closer to source data, such as RAG applications using your knowledge base.
- Use elasticity of AWS Cloud for more compute resources during peak demand.
- Use existing on-premises hardware.
This post describes a proof of concept for using a single EKS cluster to run AI inference on-premises with EKS Hybrid Nodes and in the AWS Cloud with Amazon EKS Auto Mode. EKS Auto Mode fully automates Kubernetes cluster management for compute, storage, and networking. Learn more about EKS Auto Mode in the Amazon EKS user guide.
Solution overview
For our example inference workload, we deploy a model through NVIDIA NIM. NVIDIA NIMs are microservices optimized by NVIDIA for running AI models on GPUs. We create an EKS cluster enabled for both EKS Hybrid Nodes and EKS Auto Mode, then join our on-premises machines to the cluster as hybrid nodes. For our on-premises deployment, we install the NVIDIA drivers and NVIDIA device plugin for Kubernetes, before deploying the model to EKS Hybrid Nodes. Finally, we deploy the model to EKS Auto Mode nodes, which come preconfigured with the drivers needed for both NVIDIA GPU and AWS instances. This walkthrough doesn’t include steps for establishing hybrid networking and authentication prerequisites for running EKS Hybrid Nodes, which can be found in the Amazon EKS user guide.

Figure 1: A diagram providing a high-level overview of an EKS cluster with both EKS Hybrid Nodes and EKS nodes in-Region.
The preceding figure presents a high-level diagram of the architecture we use in our walkthrough. The Amazon Virtual Private Cloud (VPC) has two public subnets and two private subnets that host the EKS Auto Mode worker nodes. Communication between the control plane and EKS Hybrid Nodes routes through the VPC, out of/into a Transit Gateway or Virtual Private Gateway, and across a private network connection. EKS Hybrid Nodes need reliable network connectivity between the on-premises environment and the AWS Region, which can be established with AWS Site-to-Site VPN, AWS Direct Connect, or a user-managed VPN solution. Routing tables, security groups, and firewall rules must be configured to allow for bidirectional communication between the environments.
Prerequisites
The following prerequisites are necessary to complete this solution:
- Amazon VPC with two private and two public subnets, with route to the internet.
- AWS Site-to-Site VPN connection between on-premises network and Amazon VPC.
- For the on-premises nodes, a CIDR block from the IPv4 RFC-1918 ranges that doesn’t overlap with the VPC CIDR range, or the Kubernetes service IPv4 CIDR.
- Hybrid nodes networking requirements for firewall rules, routing tables, and security groups are detailed in the Amazon EKS user guide.
- On-premises machines running a hybrid nodes-compatible operating system with NVIDIA drivers and NVIDIA Container Toolkit included in the machine image.
- NVIDIA NGC account and API key for accessing NIMs, see the NVIDIA documentation.
- The following tools:
Walkthrough
The following steps walk you through this solution.
Creating an EKS Hybrid Nodes and EKS Auto Mode enabled cluster
We use eksctl, a CLI tool for creating and managing clusters on Amazon EKS, to create an EKS cluster enabled for EKS Hybrid Nodes and EKS Auto Mode.
- Create a ClusterConfig file,
cluster-configuration.yaml
. This file includes theautoModeConfig
that enables EKS Auto Mode and theremoteNetworkConfig
that enables EKS Hybrid Nodes. For more information about validremoteNetworkConfig
values, see Create cluster in the EKS Hybrid Nodes documentation.
- After creating the ClusterConfig file, create the EKS cluster by running the following command:
- Wait for cluster state to become
Active.
Preparing hybrid nodes
1. EKS Hybrid Nodes need kube-proxy and CoreDNS. Install the add-ons by running the following eksctl
commands. EKS Hybrid Nodes automatically receive the label “eks.amazonaws.com/compute-type
“: “hybrid”. This label can be used to target workloads at or away from hybrid nodes. To learn more about deploying Amazon EKS add-ons with EKS Hybrid Nodes, see Configure add-ons for hybrid nodes.
If you run at least one replica of CoreDNS in the AWS Cloud, then you must allow DNS traffic to the VPC and nodes where CoreDNS is running. Furthermore, your on-premises remote Pod CIDR must be routable from your nodes in Amazon VPC. See the EKS Hybrid Nodes user guide for guidance on running mixed mode clusters.
2. You can join your on-premises nodes to the Amazon EKS control plane as EKS Hybrid Nodes. To do so, install nodeadm
, the EKS Hybrid Nodes CLI, which installs and configures the components needed to transform your machines into EKS worker nodes. These components include the kubelet, containerd, and the aws-iam-authenticator. To install nodeadm on your machines and join your nodes to the cluster, follow the steps in the EKS Hybrid Nodes documentation at Connect hybrid nodes. Before running workloads on hybrid nodes, install a compatible Container Network Interface (CNI) driver. Follow configure a CNI for hybrid nodes for steps to set up a CNI with EKS Hybrid Nodes.
When registering nodes, you can modify the kubelet configuration to add node labels or taints, for example topology.kubernetes.io/zone
, to specify which zone the hybrid nodes are in. You can also add labels to represent the different capabilities of the GPUs attached for influencing workload scheduling. For EKS Hybrid Nodes capacity with a mix of GPU and non-GPU capacity, it’s recommended that you add a --register-with-taints=nvidia.com/gpu=Exists:NoSchedule
taint to GPU nodes, so that non GPU workloads (such as CoreDNS) aren’t scheduled on GPU nodes. Review the Hybrid Nodes documentation for how to modify kubelet configuration when using nodeadm
.
3. Validate that your nodes are connected and in a Ready
state by running the following kubectl command. You must install a CNI for hybrid nodes to become Ready
.
Installing NVIDIA device plugin for Kubernetes
This section assumes that your on-premises EKS Hybrid Nodes have the necessary NVIDIA drivers and NVIDIA Container toolkit configured. Kubernetes device plugins can be used to advertise system hardware such as GPUs to the kubelet. As this walkthrough uses NVIDIA GPUs, we must install the NVIDIA Device plugin for Kubernetes to expose GPU devices to the Kubernetes scheduler. If the NVIDIA drivers and NVIDIA Container toolkit aren’t included in your machine images and configured so that containerd can use the NVIDIA Container runtime, then you can instead deploy the NVIDIA GPU Operator, which installs these components, along with the NVIDIA Device plugin at runtime.
1. To install the NVIDIA device plugin using kubectl
, first download the deployment manifest:
Review the NVIDIA Device plugin GitHub repository for the latest versions.
2. You do not need to install the NVIDIA device plugin on EKS Auto Mode, the device plugin DaemonSet should only run on hybrid nodes that have GPUs. Update the NVIDIA Device plugin to target hybrid nodes by using the label eks.amazonaws.com/compute-type: hybrid
as part of the .spec.template.spec.nodeSelector
and add any additional labels if you have a mix of GPU and non GPU workers nodes:
3. Install the NVIDIA Device plugin by applying the manifest:
4. Use the following command to validate that the NVIDIA device plugin pods are running:
You should expect the following output when listing Pods in kube-system
for the NVIDIA device plugin, and the DaemonSet should only be scheduled on nodes with a GPU:
5. You can check if the GPU is exposed to the kubelet by validating if the GPU status is visible in nodes allocatable:
The following shows the node allocatable you would expect to see when listing nodes with a GPU attached:
Deploying NVIDIA NIM for inference on EKS Hybrid Nodes
1.Before deploying NVIDIA NIM, configure your container registry and NVIDIA API keys, which are a prerequisite, and replace NGC_API_KEY
with your API key:
2. Clone the NIM helm chart by running the following command:
3. Create the helm charts overrides. Set the nodeSelector to target your hybrid nodes.
You can modify the image repository in values.yaml
file to deploy a different model.
This deployment doesn’t use a model cache. You would likely want to consider using a model cache to speed up application initialization during scaling events. To implement a model cache, you need appropriate CSI drivers configured and storage infrastructure.
Testing NIM with example prompts
1.To test the NIM microservice, create a Kubernetes port-forward to the NIM service:
2. Run the following curl command and observe the output:
Expect response:
You have successfully deployed the model to EKS Hybrid Nodes. Now you deploy the model with the EKS Auto Mode nodes running in the same EKS cluster.
Deploying to EKS Auto Mode
You can deploy workloads that don’t need to run on the EKS Hybrid Nodes in-Region. EKS Auto Mode built-in NodePools don’t have GPU-based instances, thus you must define a NodePool with GPUs. EKS Auto Mode provides out of box integration with NVIDIA GPUs and Neuron devices, so you don’t need to install drivers and device plugins.
1.Create a NodePool with g6
instance family by running the following command:
If your workload has specific network bandwidth, or instance GPU requirements, then consider also setting other well-known EKS Auto Mode supported labels.
2. Update NVIDIA NIM values for deployment on EKS Auto Mode by creating the following file:
3.Run the following command to upgrade the NIM helm release to a new version:
4. List NodeClaims
to see that EKS Auto Mode has launched a g6.xlarge
in the Region to serve the NVIDIA NIM.
To test, repeat the preceding steps, Testing NIM with example prompts.
Cleaning up
To not incur long-term costs, clean up all AWS resources created as part of this post by running the following commands:
Clean up any other resources you created as part of the prerequisites if they are no longer needed.
Conclusion
This post provides an example of how Amazon EKS Hybrid Nodes can power AI workloads. Hybrid nodes unify your Kubernetes footprint onto Amazon EKS, eliminating the need to manage the Kubernetes control plane and reducing operational overhead.
To learn more and get started with EKS Hybrid Nodes, see the EKS Hybrid Nodes user guide and explore the re:Invent 2024 session (KUB205), which explains how hybrid nodes works, features, and best practices. For more guidance on running AI/ML workloads on Amazon EKS, checkout the Data on EKS project.