EKS Deep Dive — Running Production Kubernetes on AWS
You know Kubernetes. You've run minikube start a hundred times, maybe even wrestled with kubeadm on bare metal. But running Kubernetes in production is a different animal — one where the control plane going down at 3 AM means your pager goes off instead of someone else's. Amazon EKS takes that control plane problem off your plate and lets you focus on what actually matters: deploying and scaling your workloads.
EKS Architecture — What AWS Actually Manages
EKS is a managed Kubernetes service, but "managed" has a very specific scope. AWS runs the control plane — the API server, etcd, scheduler, and controller manager — across three availability zones. You never see these nodes, never patch them, never worry about etcd backups. AWS handles all of it with a 99.95% SLA.
What AWS does NOT manage is your data plane (worker nodes). You're responsible for the EC2 instances or Fargate tasks that actually run your pods. This split is important to understand:
| Component | Managed By | You Handle |
|---|---|---|
| API Server | AWS | kubectl access, RBAC |
| etcd | AWS | Nothing |
| Scheduler | AWS | Scheduling policies, affinities |
| Worker Nodes | You (EC2/Fargate) | Scaling, patching, instance types |
| Networking | Shared | VPC design, security groups, NACLs |
| Add-ons | Shared | Installation, configuration |
The control plane costs $0.10/hour (~$73/month) regardless of cluster size. Your real costs come from the worker nodes.
Creating a Cluster with eksctl
eksctl is the official CLI tool for EKS, and it turns what would be a 200-line CloudFormation template into a single command:
# Install eksctl
curl --silent --location \
"https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" \
| tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin
# Create a production-ready cluster
eksctl create cluster \
--name production \
--version 1.29 \
--region us-east-1 \
--zones us-east-1a,us-east-1b,us-east-1c \
--nodegroup-name workers \
--node-type t3.large \
--nodes 3 \
--nodes-min 2 \
--nodes-max 10 \
--managed \
--asg-access \
--with-oidc
This single command creates a VPC with public and private subnets across three AZs, an EKS cluster, a managed node group with autoscaling, and an OIDC provider for IAM integration. Takes about 15-20 minutes.
For more control, use a config file:
# cluster-config.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: production
region: us-east-1
version: "1.29"
iam:
withOIDC: true
vpc:
cidr: 10.0.0.0/16
nat:
gateway: HighlyAvailable # One NAT GW per AZ
managedNodeGroups:
- name: general
instanceType: t3.large
minSize: 2
maxSize: 10
desiredCapacity: 3
volumeSize: 50
privateNetworking: true
labels:
workload-type: general
tags:
Environment: production
- name: compute-intensive
instanceType: c5.2xlarge
minSize: 0
maxSize: 5
desiredCapacity: 0
labels:
workload-type: compute
taints:
- key: dedicated
value: compute
effect: NoSchedule
eksctl create cluster -f cluster-config.yaml
Managed Node Groups vs Fargate Profiles
EKS gives you two ways to run pods, and each has a sweet spot.
Managed Node Groups are EC2 instances that AWS helps you manage. AWS handles the AMI updates, drains nodes during upgrades, and integrates with Auto Scaling groups. You still pick instance types, sizes, and scaling policies.
Fargate Profiles are fully serverless — no EC2 instances at all. You define which pods run on Fargate using namespace and label selectors, and AWS provisions the compute automatically.
# Fargate profile — run all pods in the "batch" namespace on Fargate
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: production
region: us-east-1
fargateProfiles:
- name: batch-jobs
selectors:
- namespace: batch
- namespace: monitoring
labels:
compute: fargate
| Feature | Managed Node Groups | Fargate |
|---|---|---|
| Pricing | EC2 pricing (cheaper at scale) | Per-pod vCPU + memory pricing |
| Scaling | Cluster Autoscaler / Karpenter | Automatic per pod |
| DaemonSets | Supported | Not supported |
| GPUs | Supported | Not supported |
| Persistent Volumes | EBS + EFS | EFS only |
| Startup Time | Seconds (existing nodes) | 30-90 seconds |
| Best For | Steady-state workloads | Burst, batch, dev/test |
My recommendation: use managed node groups as your baseline and Fargate for burst workloads or namespaces where you want zero node management.
EKS Networking — The VPC CNI Plugin
This is where EKS diverges from vanilla Kubernetes. Instead of overlay networks like Calico or Flannel, EKS uses the VPC CNI plugin by default. Every pod gets a real IP address from your VPC subnet — no NAT, no encapsulation.
# Check the CNI plugin version
kubectl describe daemonset aws-node -n kube-system | grep Image
# See real VPC IPs assigned to pods
kubectl get pods -o wide
# NAME READY IP NODE
# nginx-abc123 1/1 10.0.3.47 ip-10-0-1-15.ec2.internal
# redis-def456 1/1 10.0.4.112 ip-10-0-2-28.ec2.internal
The benefit is huge — your pods are directly reachable from anything in the VPC. RDS, ElastiCache, other EC2 instances can talk to pods using their real IPs. Security groups work at the pod level.
The tradeoff: you burn through IP addresses fast. A t3.large can host about 35 pods (based on ENI limits). Plan your VPC CIDR accordingly — /16 is the minimum I'd recommend for production.
ALB Ingress Controller
The AWS Load Balancer Controller creates Application Load Balancers for your Kubernetes Ingress resources:
# Install the AWS Load Balancer Controller
helm repo add eks https://aws.github.io/eks-charts
helm repo update
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
-n kube-system \
--set clusterName=production \
--set serviceAccount.create=false \
--set serviceAccount.name=aws-load-balancer-controller
# ingress.yaml — creates an ALB automatically
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app-ingress
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456:certificate/abc-123
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS": 443}]'
alb.ingress.kubernetes.io/ssl-redirect: "443"
spec:
rules:
- host: app.example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 80
- path: /
pathType: Prefix
backend:
service:
name: frontend-service
port:
number: 80
IAM Roles for Service Accounts (IRSA)
IRSA is one of the best features of EKS. Instead of attaching IAM roles to your worker nodes (which gives every pod on that node the same permissions), you bind IAM roles to specific Kubernetes service accounts:
# Create an IAM role and associate it with a Kubernetes service account
eksctl create iamserviceaccount \
--name s3-reader \
--namespace default \
--cluster production \
--attach-policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
--approve
# Use the service account in your pod
apiVersion: v1
kind: Pod
metadata:
name: s3-reader-pod
spec:
serviceAccountName: s3-reader
containers:
- name: app
image: my-app:latest
# This container can now call S3 APIs — no access keys needed
This is the principle of least privilege done right. Your monitoring pods get CloudWatch access, your app pods get S3 access, and nothing else bleeds across.
Cluster Autoscaler and Cost Management
The Cluster Autoscaler watches for pods that can't be scheduled due to insufficient resources and adds nodes:
# Install Cluster Autoscaler
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
--namespace kube-system \
--set autoDiscovery.clusterName=production \
--set awsRegion=us-east-1 \
--set rbac.serviceAccount.name=cluster-autoscaler \
--set extraArgs.balance-similar-node-groups=true \
--set extraArgs.skip-nodes-with-local-storage=false \
--set extraArgs.expander=least-waste
For cost optimization, consider Karpenter — AWS's newer, faster autoscaler that provisions the right instance type based on pod requirements rather than using fixed node groups.
EKS cost breakdown for a typical production cluster:
| Component | Monthly Cost |
|---|---|
| EKS Control Plane | $73 |
| 3x t3.large (on-demand) | ~$180 |
| NAT Gateway (3 AZs) | ~$100 |
| ALB | ~$25 + data |
| EBS Volumes | ~$30 |
| Total baseline | ~$408/month |
Use Spot instances for non-critical workloads to cut node costs by 60-70%.
Monitoring with CloudWatch Container Insights
# Enable Container Insights with the CloudWatch agent
aws eks create-addon \
--cluster-name production \
--addon-name amazon-cloudwatch-observability \
--region us-east-1
Container Insights gives you CPU/memory metrics at the cluster, node, pod, and container level. You get pre-built CloudWatch dashboards and can set alarms on pod restarts, CPU throttling, or memory pressure.
EKS vs ECS — When to Use What
| Factor | EKS | ECS |
|---|---|---|
| Learning Curve | Steep (Kubernetes) | Moderate (AWS-native) |
| Portability | Multi-cloud, on-prem | AWS only |
| Community | Massive (CNCF ecosystem) | AWS ecosystem |
| Control Plane Cost | $73/month | Free |
| Networking | VPC CNI, Calico, etc. | awsvpc mode |
| Service Mesh | Istio, Linkerd, App Mesh | App Mesh, Cloud Map |
| Best For | Complex microservices, multi-cloud | AWS-native apps, simpler architectures |
Choose EKS if you need Kubernetes portability, already have Kubernetes expertise, or require the CNCF ecosystem (Helm, Istio, Argo, etc.).
Choose ECS if you're all-in on AWS, want simpler operations, or are running straightforward container workloads.
EKS removes the hardest part of Kubernetes — keeping the control plane alive and healthy. But it doesn't remove the need to understand Kubernetes itself. Invest time in learning pod scheduling, resource requests and limits, network policies, and RBAC. The control plane being managed means you can focus on these application-level concerns instead of debugging etcd corruption at midnight. That's a trade worth making.
