Build an EKS cluster with Terraform

A simple infrastructure-as-code way, with custom VPC and ELB

Nico Singh

ITNEXT

· ~13 min read · September 4, 2020 (Updated: July 1, 2025) · Free: No

Last Update ⏰ October 2024

Introduction

This post describes the creation of a multi-zone Kubernetes Cluster in AWS, using Terraform with some AWS modules. Specifically, we are going to use infrastructure as code to create:

A new VPC with multi-zone public & private Subnets, and a single NAT gateway.
A Kubernetes Cluster, based on Spot EC2 instances running in private Subnets, with an Autoscaling Group based on average CPU usage.
An Application Load Balancer (ALB) to accept public HTTP calls and route them into Kubernetes nodes, as well as run health checks to scale Kubernetes services if required.
An AWS Load Balancer Controller inside the Cluster, to receive & forward HTTP requests from the outside world into Kubernetes pods.
A DNS zone with SSL certificate to provide HTTPS to each Kubernetes service. This zone will be managed from Kubernetes by a service called External DNS.
A sample application to deploy into our Cluster, using a small Helm Chart.

diagram source code — made with a wonderful tool called Diagrams

The usage of official Terraform modules brings us simplicity of coding AWS components following the best practices from verified providers (A.K.A. do not reinvent the wheel), like Private Networks or Kubernetes Clusters.

Project structure

All Terraform definitions in this example are distributed between two modules:

Base: terraform module that creates VPC & EKS resources in AWS.
Config: terraform module that configures the Kubernetes components in the EKS Cluster (ingress controller, namespaces, spot termination handler, …).

The reason behind this structure is to split the creation and configuration of infrastructure as "different projects" (thinking in infra-as-code as a distributed Unit and not as a Monolith), as well as avoid some issues related to Terraform provider dependency on some modules created by other resources located in the same project. This approach opens possibilities to use new tools/platforms to configure infrastructure in a more flexible way for the future.

Requirements

AWS Account, with programatic access. We will use these credentials to configure some environment variables later.
Terraform CLI or Terraform Cloud. In this document we use 1.1.9 version, but feel free to use newer versions if you want to. My recommendation is to use a docker image or tfenv, to simplify the installation and usage of a specific version.
A terminal to run Terraform CLI, or a source control repo if you are using Terraform Cloud. In my personal case I use a CI pipeline for this, to break the dependency of a computer to run Terraform commands, and have history about past deployments applied.

💰 Budget callout: creating VPC, EKS & DNS resources is probably going to bring some cost in your AWS monthly Billing, since some resources may go beyond the free tier. So, be aware of this before applying any Terraform plans!.

Terraform Configuration

After a short introduction, let's get into our infrastructure as code! We will see small snippets of Terraform configuration required on each step; Feel free to copy them and try applying these plans on your own. But, if you are getting curious or impatient to get this done, take a look into the repository with all Terraform configurations managed by CI pipelines to apply them.

The very first step in Terraform is to define Terraform configurations, related to version, providers and state file backend:

✅ Recommendation: It is a good idea to declare the version of Terraform to be used while coding our Infrastructure, to avoid any breaking changes that could affect to our code if we use newer/older versions when running terraform in the future.

✅ Recommendation: Resource providers can be handled automatically by Terraform while running init command. However, it is a good idea to define them explicitly using version numbers (as shown above) to avoid datasource/resource breaking changes due by future versions.

✅ Recommendation: AWS Terraform provider configuration includes a default_tags definition, which is a great option to automatically create Tags over all resources created in AWS. These Tags are not mandatory, but are a great recommendation to track costs in an organized way.

✅ Recommendation: Backend configuration is almost empty, and that is in purpose. It is recommended to externalize this setup to several files if required (e.g. having one config per environment). In this case we will use a single S3 backend, with several state files for each terraform workspace:

Which means that we will use an S3 bucket called "my-vibrant-and-nifty-app-infra" which will look like this:

s3://my-vibrant-and-nifty-app-infra/
|_environment/
  |_development/
  | |_infra.json
  |_staging/
  | |_infra.json
  |_production/
  | |_infra.json

Where each folder represents an environment (development, staging & production) hosting the Terraform remote state file.

On the other hand, it is recommended to avoid defining AWS credentials in provider blocks. Instead we could use environment variables for this purpose, which will be automatically used by Terraform to authenticate against AWS APIs:

AWS_ACCESS_KEY_ID=AKIAXXXXXXXXXXXXXXXX
AWS_SECRET_ACCESS_KEY=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
AWS_DEFAULT_REGION=us-east-1

⚠️ Important: The S3 bucket defined in here will not be created by Terraform if it does not exist in AWS. This bucket has be externally created by manual action, or using a CI/CD tool running a command like this:

aws s3 mb s3://my-vibrant-and-nifty-app-infra --region us-west-2

⚠️ Important: Bear in mind that S3 bucket names must be unique worldwide, across AWS accounts and regions. Try to use a custom name for your bucket when running aws s3 mb command, and also when defining backend.tfvars file. That is the reason why I chose a very-customized name as "my-vibrant-and-nifty-app-infra".

To initialize each workspace, for instance development, we should run the following command:

terraform init -backend-config=backend.tfvars
terraform workspace new development

In future executions, we can select our existing workspace using the following command:

terraform init -backend-config=backend.tfvars
terraform workspace select development

Now, we're ready to start writing our Infrastructure as code!.

Local Modules Structure

As commented before, all Terraform resources described in the article are distributed in two modules hosted in a single repo, called Base and Config. They will be called from the root folder as follows:

This usage of modules will create an implicit dependency in Terraform by using module.base.cluster_* as an input for the second module, which is an output of the first module and depends on the creation of the EKS Cluster. In other words, all resources defined in Config module will depend on the Kubernetes Cluster created in Base module without using the boring-and-ugly depends_on option.

All variables used above are defined in a single file called variables.tf, located in the root of the project. They are also defined on each module file, and will be explained in detail in the next sections.

[Base module] VPC Creation

Let's start by creating a new VPC to isolate our EKS-related resources in a safe place, using the official VPC terraform module published by AWS in our Infrastructure Terraform project:

As it is commented in the previous code block, we will create a new VPC with subnets on each Availability Zone specified with a single NAT Gateway to save some costs, adding some Tags required by EKS. Remember to also define some variable values file (e.g. one for each environment) for the previous block:

Now, we should be ready to create this VPC resources using Terraform. If we already ran init command, we can examine the resources to be created or updated by Terraform using plan command:

terraform plan -out=development.tfplan \
  -var-file=base-network-development.tfvars

And then, we can apply those changes using apply command, after user confirmation:

terraform apply development.tfplan

[Base module] EKS Cluster

The next move is to use the official EKS Terraform module to create a new Kubernetes Cluster:

As shown in the previous code block, we are creating an EKS Cluster that uses an EC2 autoscaling group for Kubernetes, composed by Spot instances autoscaled out/down based on CPU average usage.

The Cluster includes the EKS Pod Identity Agent add-on, that simplifies the process for cluster administrators to configure Kubernetes applications with AWS IAM permissions. We'll go deeper later on this.

Bear in mind that this Terraform configuration block uses some variables defined on the previous Terraform blocks, so it is required to store it as a new file at the same folder as the VPC definition file. As we will be using new variables as well, we are going to define them in a new variables file as follows:

As we may see above, we are defining two EKS node groups (which will handle EC2 instances for Kubernetes in the background). One with X86 spot instances (Intel & AMD) and one with on-demand ARM instances (Graviton). This is just because I wanted to do it like this; Feel free to change it according to your own infrastructure requirements-budgets-constraints.

✅ Recommendation: to facilitate code reading and an easy variable files usage, it is a good idea to create a separate Terraform configuration file to define all variables at once (e.g. variables.tf) and then define several variable values files as:

A single terraform.tfvars file (automatically loaded by Terraform commands) with all generic variable values, which do not have customized or environment-specific values.
Environment-or-case-specific *.tfvars files with all variable values which will be specific to a particular case or environment, and will be explicitly used when running terraform plan command.

However, for the sake of simplicity we will skip these rules to simplify understanding of each part step by step on the creation of AWS resources. This means that we will run terraform plan command adding every variable value file, as we write new configuration blocks:

terraform plan -out=development.tfplan \
  -var-file=base-network-development.tfvars \
  -var-file=base-eks-development.tfvars
terraform apply development.tfplan

Once the plan is applied, we have a brand-new EKS cluster in AWS! Let's move on with the configuration of this Kubernetes Cluster, using the other Terraform module.

[Config module] EKS Cluster

Since we are now in another Terraform module and the EKS Cluster already exists, we will have to fetch their data using a Data Source. Let's do this thing and some other stuff:

As you may see, in here we are:

Getting our existing EKS Cluster as a data source, in order to configure the Kubernetes & Helm Terraform providers.
Deploying a Helm Chart for the EC2 Kubernetes Spot termination handler, which takes care of reallocating Kubernetes objects when Spot instances get automatically terminated by AWS.

Using the variables required for both things:

We can plan & apply as usual:

terraform plan -out=development.tfplan \
  -var-file=base-network-development.tfvars \
  -var-file=base-eks-development.tfvars \
  -var-file=config-eks-development.tfvars
terraform apply development.tfplan

And Terraform will take care of running "helm install" commands for us.

[Config module] IAM Access

The next step is to configure any required accesses to AWS users into the EKS Cluster. We will make use of aws-auth ConfigMap for this purpose:

If we take a read into the aws-auth documentation, we will also need to configure RBAC access in our Kubernetes Cluster. That's why we create a ClusterRoleBinding object for our developer users, with custom access to get and port-forward the Kubernetes pods.

The variables file for these resources would look like this:

⚠️ Note: The user IDs displayed above are fictitious, and of course they have to be customized according to the user groups present in your AWS account. Have in mind that these usernames do not have to exist as AWS IAM identities at the moment of creating the EKS Cluster nor assigning RBAC accesses, since they will live inside the Kubernetes Cluster only. IAM/Kubernetes usernames correlation is handled by AWS CLI at the moment of authenticating with the EKS Cluster.

To plan & apply these definitions, we should run something like:

terraform plan -out=development.tfplan \
  -var-file=base-network-development.tfvars \
  -var-file=base-eks-development.tfvars \
  -var-file=config-eks-development.tfvars \
  -var-file=config-iam-development.tfvars
terraform apply development.tfplan

[Config module] Load Balancer

Now we can move on creating an Application Load Balancer (ALB), to handle HTTP requests to our services. The creation of the ALB(s) will be responsibility of the AWS Load Balancer Controller service, which will be deployed using Helm:

As you may see above, the Ingress definition uses a new AWS-issued SSL certificate to provide HTTPS in our ALB to be put in front of our Kubernetes pods, and also defines some annotations required by the Load Balancer Controller.

The Kubernetes service installed by this Helm Chart will not create any ALBs nor DNS records in AWS. These resources will be created later by the Load Balancer Controller and External DNS services respectively, as soon as new Ingress objects get created in Kubernetes using their required annotations. We will dive on these annotations at the end of the post.

IAM permissions required to manage ALBs and Route53 records were already granted to LB Controller and External DNS services during the creation of the EKS Cluster.

⚠️ Note: The terraform project shown in here re-uses a DNS Zone created outside of this Terraform workspace (defined in "dns_base_domain" variable). That is the reason why we are using a data source to fetch an existing Route53 zone instead of creating a new resource. Feel free to change this if required, and create new DNS resources if you do not have any already.

As well as other Terraform configuration files, this one also uses some new variables. So, let's define them for our "development" environment:

And then run terraform plan & apply:

terraform plan -out=development.tfplan \
  -var-file=base-network-development.tfvars \
  -var-file=base-eks-development.tfvars \
  -var-file=config-eks-development.tfvars \
  -var-file=config-iam-development.tfvars \
  -var-file=config-ingress-development.tfvars
terraform apply development.tfplan

[Config module] External DNS

The next step is to deploy the ExternalDNS service, which will be responsible of managing Route53 records, requested by each Ingress definition in Kubernetes:

As we may see above, the external-dns Helm Chart requires some annotations related to the new ACM certificate generated to provide SSL connections, and the Route53 base domain to create/modify/delete records from Kubernetes.

Some other variables are required by the terraform definitions to set log level and synchronization frequency:

And will be applied as follows, after user confirmation:

terraform plan -out=development.tfplan \
  -var-file=base-network-development.tfvars \
  -var-file=base-eks-development.tfvars \
  -var-file=config-eks-development.tfvars \
  -var-file=config-iam-development.tfvars \
  -var-file=config-ingress-development.tfvars \
  -var-file=config-external-dns-development.tfvars
terraform apply development.tfplan

[Config module] Kubernetes Namespaces

The final step — not really mandatory but recommended — is to define some Kubernetes namespaces to separate our Deployments and have better management & visibility of applications in our Cluster:

This configuration file expects a list of namespaces to be created in our EKS Cluster:

Which could be applied as:

terraform plan -out=development.tfplan \
  -var-file=base-network-development.tfvars \
  -var-file=base-eks-development.tfvars \
  -var-file=config-eks-development.tfvars \
  -var-file=config-iam-development.tfvars \
  -var-file=config-ingress-development.tfvars \
  -var-file=config-external-dns-development.tfvars \
  -var-file=config-namespaces-development.tfvars
terraform apply development.tfplan

[Config module] EKS Pod Identity (optional)

As you may remember from above, we enabled the EKS Pod Identity Agent add-on while using the EKS module to create the Cluster. The purpose of this add-on is to make our cluster able to access AWS resources outside EKS if needed (e.g., an S3 bucket or an SNS topic) when authorized using Kubernetes Service Accounts. In other words, the idea of the add-on is to simplify the association between Kubernetes Service Accounts and IAM roles to make Pods able to access external AWS resources:

The code from above is based on the example shown in the Terraform registry, and loops a map containing all required IAM policies to be mapped as roles and then Service Accounts:

As we may see from variable from above, we're creating a Service Account Association in Kubernetes sample-apps namespace that will be magically mapped to a new IAM role with AmazonS3ReadOnlyAccess permissions, thanks to the EKS Pod Identity Agent.

🚩 Note: Bear in mind that setting Pod Identities is optional, as you may not need any IAM permissions to be mapped in Kubernetes. If that's your case, which is perfectly possible, feel free to leave this variable empty:

In any case, the Terraform definitions can be applied issuing the following command:

terraform plan -out=development.tfplan \
  -var-file=base-network-development.tfvars \
  -var-file=base-eks-development.tfvars \
  -var-file=config-eks-development.tfvars \
  -var-file=config-iam-development.tfvars \
  -var-file=config-ingress-development.tfvars \
  -var-file=config-external-dns-development.tfvars \
  -var-file=config-namespaces-development.tfvars \
  -var-file=config-pod-identity-development.tfvars
terraform apply development.tfplan

Then, in order to leverage any defined IAM permissions we need to create the Service Accounts themselves in the corresponding Kubernetes namespaces and configure our Deployments to use these Service Accounts:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: s3-readonly
  namespace: sample-apps
---
apiVersion: v1
kind: Pod
metadata:
  name: sample-app
  namespace: sample-apps
spec:
  containers:
    - name: aws-cli
      image: amazon/aws-cli:latest
      command: ['sleep', '36000']
  restartPolicy: Never
  serviceAccountName: s3-readonly

That's it! We finally have a production-ready EKS Cluster ready to host applications with public IP access 🎉. Remember to visit the sample repository to have a complete look of all these Terraform configurations, and a sample CI pipeline to apply them in AWS.

[Bonus] Sample application Deployment

As a bonus, I will leave a link of a sample application, which deploys a very small container into our new Kubernetes Cluster using Helm, based on this docker image. It also contains some CI jobs that could help you to get familiar with aws eks and helm commands.

As commented before, the Ingress object inside of each application holds the magic to create ALBs and manage Route53 records:

The annotations starting with alb.ingress.kubernetes.io take care of managing ALBs in AWS via the LB Controller (reference), and the annotations starting with external-dns.alpha.kubernetes.io take care of managing Route53 records via ExternalDNS (reference).

Wrapping up

That's it for now! I hope this page helped you to understand some key concepts behind a basic Kubernetes Cluster in AWS, and get your hands on with some good practices about Terraform configuration files.

I would really appreciate any kind of feedback, questions or comments. Feel free to ping me in here, or post any comments in this post.

#terraform #kubernetes #aws #aws-eks #helm