Skip to main content

EKS Deep Dive — Running Production Kubernetes on AWS

· 8 min read
Goel Academy
DevOps & Cloud Learning Hub

You know Kubernetes. You've run minikube start a hundred times, maybe even wrestled with kubeadm on bare metal. But running Kubernetes in production is a different animal — one where the control plane going down at 3 AM means your pager goes off instead of someone else's. Amazon EKS takes that control plane problem off your plate and lets you focus on what actually matters: deploying and scaling your workloads.

EKS Architecture — What AWS Actually Manages

EKS is a managed Kubernetes service, but "managed" has a very specific scope. AWS runs the control plane — the API server, etcd, scheduler, and controller manager — across three availability zones. You never see these nodes, never patch them, never worry about etcd backups. AWS handles all of it with a 99.95% SLA.

What AWS does NOT manage is your data plane (worker nodes). You're responsible for the EC2 instances or Fargate tasks that actually run your pods. This split is important to understand:

ComponentManaged ByYou Handle
API ServerAWSkubectl access, RBAC
etcdAWSNothing
SchedulerAWSScheduling policies, affinities
Worker NodesYou (EC2/Fargate)Scaling, patching, instance types
NetworkingSharedVPC design, security groups, NACLs
Add-onsSharedInstallation, configuration

The control plane costs $0.10/hour (~$73/month) regardless of cluster size. Your real costs come from the worker nodes.

Creating a Cluster with eksctl

eksctl is the official CLI tool for EKS, and it turns what would be a 200-line CloudFormation template into a single command:

# Install eksctl
curl --silent --location \
"https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" \
| tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin

# Create a production-ready cluster
eksctl create cluster \
--name production \
--version 1.29 \
--region us-east-1 \
--zones us-east-1a,us-east-1b,us-east-1c \
--nodegroup-name workers \
--node-type t3.large \
--nodes 3 \
--nodes-min 2 \
--nodes-max 10 \
--managed \
--asg-access \
--with-oidc

This single command creates a VPC with public and private subnets across three AZs, an EKS cluster, a managed node group with autoscaling, and an OIDC provider for IAM integration. Takes about 15-20 minutes.

For more control, use a config file:

# cluster-config.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
name: production
region: us-east-1
version: "1.29"

iam:
withOIDC: true

vpc:
cidr: 10.0.0.0/16
nat:
gateway: HighlyAvailable # One NAT GW per AZ

managedNodeGroups:
- name: general
instanceType: t3.large
minSize: 2
maxSize: 10
desiredCapacity: 3
volumeSize: 50
privateNetworking: true
labels:
workload-type: general
tags:
Environment: production

- name: compute-intensive
instanceType: c5.2xlarge
minSize: 0
maxSize: 5
desiredCapacity: 0
labels:
workload-type: compute
taints:
- key: dedicated
value: compute
effect: NoSchedule
eksctl create cluster -f cluster-config.yaml

Managed Node Groups vs Fargate Profiles

EKS gives you two ways to run pods, and each has a sweet spot.

Managed Node Groups are EC2 instances that AWS helps you manage. AWS handles the AMI updates, drains nodes during upgrades, and integrates with Auto Scaling groups. You still pick instance types, sizes, and scaling policies.

Fargate Profiles are fully serverless — no EC2 instances at all. You define which pods run on Fargate using namespace and label selectors, and AWS provisions the compute automatically.

# Fargate profile — run all pods in the "batch" namespace on Fargate
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
name: production
region: us-east-1

fargateProfiles:
- name: batch-jobs
selectors:
- namespace: batch
- namespace: monitoring
labels:
compute: fargate
FeatureManaged Node GroupsFargate
PricingEC2 pricing (cheaper at scale)Per-pod vCPU + memory pricing
ScalingCluster Autoscaler / KarpenterAutomatic per pod
DaemonSetsSupportedNot supported
GPUsSupportedNot supported
Persistent VolumesEBS + EFSEFS only
Startup TimeSeconds (existing nodes)30-90 seconds
Best ForSteady-state workloadsBurst, batch, dev/test

My recommendation: use managed node groups as your baseline and Fargate for burst workloads or namespaces where you want zero node management.

EKS Networking — The VPC CNI Plugin

This is where EKS diverges from vanilla Kubernetes. Instead of overlay networks like Calico or Flannel, EKS uses the VPC CNI plugin by default. Every pod gets a real IP address from your VPC subnet — no NAT, no encapsulation.

# Check the CNI plugin version
kubectl describe daemonset aws-node -n kube-system | grep Image

# See real VPC IPs assigned to pods
kubectl get pods -o wide
# NAME READY IP NODE
# nginx-abc123 1/1 10.0.3.47 ip-10-0-1-15.ec2.internal
# redis-def456 1/1 10.0.4.112 ip-10-0-2-28.ec2.internal

The benefit is huge — your pods are directly reachable from anything in the VPC. RDS, ElastiCache, other EC2 instances can talk to pods using their real IPs. Security groups work at the pod level.

The tradeoff: you burn through IP addresses fast. A t3.large can host about 35 pods (based on ENI limits). Plan your VPC CIDR accordingly — /16 is the minimum I'd recommend for production.

ALB Ingress Controller

The AWS Load Balancer Controller creates Application Load Balancers for your Kubernetes Ingress resources:

# Install the AWS Load Balancer Controller
helm repo add eks https://aws.github.io/eks-charts
helm repo update

helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
-n kube-system \
--set clusterName=production \
--set serviceAccount.create=false \
--set serviceAccount.name=aws-load-balancer-controller
# ingress.yaml — creates an ALB automatically
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456:certificate/abc-123
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS": 443}]'
alb.ingress.kubernetes.io/ssl-redirect: "443"
spec:
rules:
- host: app.example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80
- path: /
pathType: Prefix
backend:
service:
name: frontend-service
port:
number: 80

IAM Roles for Service Accounts (IRSA)

IRSA is one of the best features of EKS. Instead of attaching IAM roles to your worker nodes (which gives every pod on that node the same permissions), you bind IAM roles to specific Kubernetes service accounts:

# Create an IAM role and associate it with a Kubernetes service account
eksctl create iamserviceaccount \
--name s3-reader \
--namespace default \
--cluster production \
--attach-policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
--approve
# Use the service account in your pod
apiVersion: v1
kind: Pod
metadata:
name: s3-reader-pod
spec:
serviceAccountName: s3-reader
containers:
- name: app
image: my-app:latest
# This container can now call S3 APIs — no access keys needed

This is the principle of least privilege done right. Your monitoring pods get CloudWatch access, your app pods get S3 access, and nothing else bleeds across.

Cluster Autoscaler and Cost Management

The Cluster Autoscaler watches for pods that can't be scheduled due to insufficient resources and adds nodes:

# Install Cluster Autoscaler
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
--namespace kube-system \
--set autoDiscovery.clusterName=production \
--set awsRegion=us-east-1 \
--set rbac.serviceAccount.name=cluster-autoscaler \
--set extraArgs.balance-similar-node-groups=true \
--set extraArgs.skip-nodes-with-local-storage=false \
--set extraArgs.expander=least-waste

For cost optimization, consider Karpenter — AWS's newer, faster autoscaler that provisions the right instance type based on pod requirements rather than using fixed node groups.

EKS cost breakdown for a typical production cluster:

ComponentMonthly Cost
EKS Control Plane$73
3x t3.large (on-demand)~$180
NAT Gateway (3 AZs)~$100
ALB~$25 + data
EBS Volumes~$30
Total baseline~$408/month

Use Spot instances for non-critical workloads to cut node costs by 60-70%.

Monitoring with CloudWatch Container Insights

# Enable Container Insights with the CloudWatch agent
aws eks create-addon \
--cluster-name production \
--addon-name amazon-cloudwatch-observability \
--region us-east-1

Container Insights gives you CPU/memory metrics at the cluster, node, pod, and container level. You get pre-built CloudWatch dashboards and can set alarms on pod restarts, CPU throttling, or memory pressure.

EKS vs ECS — When to Use What

FactorEKSECS
Learning CurveSteep (Kubernetes)Moderate (AWS-native)
PortabilityMulti-cloud, on-premAWS only
CommunityMassive (CNCF ecosystem)AWS ecosystem
Control Plane Cost$73/monthFree
NetworkingVPC CNI, Calico, etc.awsvpc mode
Service MeshIstio, Linkerd, App MeshApp Mesh, Cloud Map
Best ForComplex microservices, multi-cloudAWS-native apps, simpler architectures

Choose EKS if you need Kubernetes portability, already have Kubernetes expertise, or require the CNCF ecosystem (Helm, Istio, Argo, etc.).

Choose ECS if you're all-in on AWS, want simpler operations, or are running straightforward container workloads.


EKS removes the hardest part of Kubernetes — keeping the control plane alive and healthy. But it doesn't remove the need to understand Kubernetes itself. Invest time in learning pod scheduling, resource requests and limits, network policies, and RBAC. The control plane being managed means you can focus on these application-level concerns instead of debugging etcd corruption at midnight. That's a trade worth making.