AWS HPC Blog
Use Terraform to deploy a complete AWS Batch environment on Amazon EKS
When we announced AWS Batch support for Amazon Elastic Kubernetes Service (Amazon EKS), we laid out our thinking. Compute intensive, high scale batch workloads have materially different operational challenges from microservices. The operational overhead of these spiky and transient workloads is undifferentiated heavy lifting. Since that announcement, we improved the performance for placing pods on the cluster, enabled private endpoints for tighter security, added multiple containers for pods, and added the ability for gang scheduling jobs across nodes.
Because AWS Batch relies on customers to provide their own clusters to scale nodes and run pods, there are a number of different types of resources to coordinate. These resources not only include the EKS cluster, Kubernetes roles and permissions on that cluster, and IAM roles for service accounts. It also encompasses AWS Batch resources, such as job queues, compute environments, and job definitions.
While the AWS Batch User Guide and documentation show how to set up all of these resources, they (by necessity) describe the process using manual steps. However, manual steps are error-prone. They are fine for a small proof of concept or tutorial, but they are not what you want to rely on for production deployments. For that, you want to leverage infrastructure as code.
We are happy to announce a new blueprint for AWS Batch on Amazon EKS is available within the Data on Amazon EKS (DoEKS) open source project. DoEKS provides blueprints, tutorials, and best practices for running different types of data-centric tooling and workloads on Amazon EKS.
The new AWS Batch blueprint provides a complete example using HashiCorp Terraform to create all of the AWS resources and Kubernetes permissions needed for a robust batch processing environment, including separate job queues for leveraging On-Demand and Spot Instances.
To deploy the AWS Batch on EKS blueprint resources, refer to the tutorial page. Note that since the Terraform AWS provider is not automatically generated from the AWS SDK it lags AWS Batch feature releases. For example, the provider does not yet support Pod specifications with multiple containers nor gang-scheduling capability. We are working with AWS Partner River Point Technology to make upstream contributions to support these AWS Batch features.
Conclusion
We are excited to get a complete infrastructure as code example into customer hands and can’t wait to hear your feedback. Just send a note to ask-hpc@amazon.com if you have something to say!
To get started using the AWS Batch on your EKS clusters, check out the Data on Amazon EKS blueprint for AWS Batch.