AWS Startups Blog
From Zero to EKS with Terraform and Helm
Guest post by Pete Lesko, DevOps Engineer and Dan Richardson, Director of DevOps at Aledade
At Aledade, we perform ETL on the healthcare data of millions of patients from thousands of different sources. Airflow is the primary tool we leverage for workflow management.
Because the amount of data we process is growing exponentially, we have quickly outgrown the ability to scale our dockerized Airflow deploy horizontally. We decided to move Airflow into Kubernetes to take advantage of their native support for scaling pods up and down, as needed, to handle tasks. With zero experience running a Kubernetes cluster, EKS allowed us to get up and running rapidly.
Here is how we did it.
There are a few tools that allow you to get up and running quickly on EKS. Cloudformation, Terraform, and eksctl are all good options, with eksctl probably being the quickest way to get started. We picked Terraform because we were already using it to manage our AWS infrastructure. Terraform provides a nice tutorial and sample code repository to help you create all the necessary AWS services to run EKS. Their sample code is a good starting place and you can easily modify it to better suit your AWS environment.
NOTE: This tutorial will create a cluster in us-west-2
using the 10.0.0.0/16 subnet
.
What You’ll Need
Before you get started, you’ll need a few tools installed. Terraform is a tool to create, change, and improve infrastructure. Helm is a package management tool for Kubernetes. You’ll need to install them both:
terraform
– https://www.terraform.iohelm
– https://helm.sh
Terraform
Let’s start by cloning Terraform’s EKS git
repository from their AWS EKS Introduction. You’ll need to have installed the git client, a version control tool, for your operating system for the next command. On Ubuntu systems, you can accomplish this with apt-get install git
, and RedHat based systems with yum install git
. Now you can clone the Terraform AWS repository:
Now, to see a detailed outline of the changes Terraform would make, run plan. This should include the EKS cluster, VPC, and other AWS resources that will be facilitated in this project:
Make sure to review the changes. The plan command will additionally warn you if there are any errors in your Terraform code. Assuming everything looks alright, since this is a fresh checkout, you should be able to apply the default configuration using the apply
:
Terraform will prompt you to make sure that you want to apply the changes, since this will create resources that will incur charges on our AWS account. You’ll want to go ahead and apply the changes since you already reviewed them with the plan command previously:
By default, the resources are targeted to be created in us-west-2
, so bear that in mind if you go looking for the resources created in your console. This apply step will create many of the resources you need to get up and running initially, including:
- VPC
- IAM roles
- Security groups
- An internet gateway
- Subnets
- Autoscaling group
- Route table
- EKS cluster
- Your kubectl configuration
Setting Up kubectl
You will need the configuration output from Terraform in order to use kubectl to interact with your new cluster. Create your kube configuration directory, and output the configuration from Terraform into the config
file using the Terraform output
command:
You’ll need kubectl
, a command line tool to run commands against Kubernetes clusters, for the next step. Installation instructions can be found here. Once you’ve got this installed, you’ll want to check to make sure that you’re connected to your cluster by running kubectl version
. Your output may vary slightly here:
Now let’s add the ConfigMap
to the cluster from Terraform as well. The ConfigMap is a Kubernetes configuration, in this case for granting access to our EKS cluster. This ConfigMap allows our ec2 instances in the cluster to communicate with the EKS master, as well as allowing our user account access to run commands against the cluster. You’ll run the Terraform output command to a file, and the kubectl apply command to apply that file:
Once this is complete, you should see your nodes from your autoscaling group either starting to join or joined to the cluster. Once the second column reads Ready
the node can have deployments pushed to it. Again, your output may vary here:
At this point, your EKS cluster is up, the nodes have joined, and they are ready for a deployment!
Helm
Next, you’ll install Helm. First you need to create a Kubernetes ServiceAccount
for tiller, which allows helm to talk to the cluster:
Now, you apply the ServiceAccount with kubectl, and install helm with the init command:
You will need a way for our Airflow deployment to communicate with the outside world. For this, you will install nginx-ingress, an ingress controller that uses ConfigMap to store nginx configurations. Nginx is an industry standard software for web and proxy servers. We will use the proxy feature to serve up our Airflow web interface. Install nginx-ingress via the helm chart:
Airflow
You need to override some values in the Airflow chart to tell it to use the nginx ingress controller. You’ll want to replace airflow-k8s.aledade.com
with a hostname of your own:
cat >tiller-user.yaml <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: tiller
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: tiller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: tiller
namespace: kube-system
EOF
Finally, you install Airflow via the helm chart and the values file you just created using the helm install
command:
This may take a few moments before all of the pods are ready, and you can monitor the progress with:
$ watch "kubectl get pods -n airflow"
Even after the pods are running, I’ve found it takes at least five minutes for everything to completely spin up.
You can find out the internet accessible endpoint by querying the services and looking for the LoadBalancer Ingress
$ kubectl describe services |grep ^LoadBalancer
LoadBalancer Ingress: 8a69022e5f102be1072e5fb1087f5fbe-e907efv7e8.us-west-2.elb.amazonaws.com
If you visit this URL, you will find the flower
interface, a web tool for monitoring and administering celery clusters.
To reach the Airflow administrative interface, you will need to add an entry to /etc/hosts, but first you need to get the IP address of that LoadBalancer Ingress
, and add it to your /etc/hosts
:
cat >values.yaml <<EOF
ingress:
enabled: true
web:
path: "/"
host: "airflow-k8s.aledade.com"
tls:
enabled: true
annotations:
kubernetes.io/ingress.class: "nginx"
EOF
Afterwards, you can reach the Airflow administrative interface from the URL http://airflow-k8s.aledade.com in your browser. Under a production environment, you would replace airflow-k8s.aledade.com
with a FQDN that you can add as an alias in route53 to point to the ELB
created by the LoadBalancer Ingress.
Cleaning Up
To destroy these resources, delete the helm deployments, and issue a destroy with Terraform
Conclusion + Future
At Aledade, we help transform primary care by delivering more efficient technology-enabled workflows to primary care providers. We analyze data with Python, Docker, and Terraform and have a CI/CD pipeline into EKS. If that sounds good to you, consider joining our team!