EKS Deep Dive — Running Production Kubernetes on AWS

December 6, 2025 · 8 min read

DevOps & Cloud Learning Hub

You know Kubernetes. You've run minikube start a hundred times, maybe even wrestled with kubeadm on bare metal. But running Kubernetes in production is a different animal — one where the control plane going down at 3 AM means your pager goes off instead of someone else's. Amazon EKS takes that control plane problem off your plate and lets you focus on what actually matters: deploying and scaling your workloads.

EKS Architecture — What AWS Actually Manages

EKS is a managed Kubernetes service, but "managed" has a very specific scope. AWS runs the control plane — the API server, etcd, scheduler, and controller manager — across three availability zones. You never see these nodes, never patch them, never worry about etcd backups. AWS handles all of it with a 99.95% SLA.

What AWS does NOT manage is your data plane (worker nodes). You're responsible for the EC2 instances or Fargate tasks that actually run your pods. This split is important to understand:

Component	Managed By	You Handle
API Server	AWS	kubectl access, RBAC
etcd	AWS	Nothing
Scheduler	AWS	Scheduling policies, affinities
Worker Nodes	You (EC2/Fargate)	Scaling, patching, instance types
Networking	Shared	VPC design, security groups, NACLs
Add-ons	Shared	Installation, configuration

The control plane costs $0.10/hour (~$73/month) regardless of cluster size. Your real costs come from the worker nodes.

Creating a Cluster with eksctl

eksctl is the official CLI tool for EKS, and it turns what would be a 200-line CloudFormation template into a single command:

# Install eksctl
curl --silent --location \
  "https://github.com/eksctl-io/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" \
  | tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin

# Create a production-ready cluster
eksctl create cluster \
  --name production \
  --version 1.29 \
  --region us-east-1 \
  --zones us-east-1a,us-east-1b,us-east-1c \
  --nodegroup-name workers \
  --node-type t3.large \
  --nodes 3 \
  --nodes-min 2 \
  --nodes-max 10 \
  --managed \
  --asg-access \
  --with-oidc

This single command creates a VPC with public and private subnets across three AZs, an EKS cluster, a managed node group with autoscaling, and an OIDC provider for IAM integration. Takes about 15-20 minutes.

For more control, use a config file:

# cluster-config.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: production
  region: us-east-1
  version: "1.29"

iam:
  withOIDC: true

vpc:
  cidr: 10.0.0.0/16
  nat:
    gateway: HighlyAvailable  # One NAT GW per AZ

managedNodeGroups:
  - name: general
    instanceType: t3.large
    minSize: 2
    maxSize: 10
    desiredCapacity: 3
    volumeSize: 50
    privateNetworking: true
    labels:
      workload-type: general
    tags:
      Environment: production

  - name: compute-intensive
    instanceType: c5.2xlarge
    minSize: 0
    maxSize: 5
    desiredCapacity: 0
    labels:
      workload-type: compute
    taints:
      - key: dedicated
        value: compute
        effect: NoSchedule

eksctl create cluster -f cluster-config.yaml

Managed Node Groups vs Fargate Profiles

EKS gives you two ways to run pods, and each has a sweet spot.

Managed Node Groups are EC2 instances that AWS helps you manage. AWS handles the AMI updates, drains nodes during upgrades, and integrates with Auto Scaling groups. You still pick instance types, sizes, and scaling policies.

Fargate Profiles are fully serverless — no EC2 instances at all. You define which pods run on Fargate using namespace and label selectors, and AWS provisions the compute automatically.

# Fargate profile — run all pods in the "batch" namespace on Fargate
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: production
  region: us-east-1

fargateProfiles:
  - name: batch-jobs
    selectors:
      - namespace: batch
      - namespace: monitoring
        labels:
          compute: fargate

Feature	Managed Node Groups	Fargate
Pricing	EC2 pricing (cheaper at scale)	Per-pod vCPU + memory pricing
Scaling	Cluster Autoscaler / Karpenter	Automatic per pod
DaemonSets	Supported	Not supported
GPUs	Supported	Not supported
Persistent Volumes	EBS + EFS	EFS only
Startup Time	Seconds (existing nodes)	30-90 seconds
Best For	Steady-state workloads	Burst, batch, dev/test

My recommendation: use managed node groups as your baseline and Fargate for burst workloads or namespaces where you want zero node management.

EKS Networking — The VPC CNI Plugin

This is where EKS diverges from vanilla Kubernetes. Instead of overlay networks like Calico or Flannel, EKS uses the VPC CNI plugin by default. Every pod gets a real IP address from your VPC subnet — no NAT, no encapsulation.

# Check the CNI plugin version
kubectl describe daemonset aws-node -n kube-system | grep Image

# See real VPC IPs assigned to pods
kubectl get pods -o wide
# NAME           READY   IP            NODE
# nginx-abc123   1/1     10.0.3.47     ip-10-0-1-15.ec2.internal
# redis-def456   1/1     10.0.4.112    ip-10-0-2-28.ec2.internal

The benefit is huge — your pods are directly reachable from anything in the VPC. RDS, ElastiCache, other EC2 instances can talk to pods using their real IPs. Security groups work at the pod level.

The tradeoff: you burn through IP addresses fast. A t3.large can host about 35 pods (based on ENI limits). Plan your VPC CIDR accordingly — /16 is the minimum I'd recommend for production.

ALB Ingress Controller

The AWS Load Balancer Controller creates Application Load Balancers for your Kubernetes Ingress resources:

# Install the AWS Load Balancer Controller
helm repo add eks https://aws.github.io/eks-charts
helm repo update

helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
  -n kube-system \
  --set clusterName=production \
  --set serviceAccount.create=false \
  --set serviceAccount.name=aws-load-balancer-controller

# ingress.yaml — creates an ALB automatically
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456:certificate/abc-123
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS": 443}]'
    alb.ingress.kubernetes.io/ssl-redirect: "443"
spec:
  rules:
    - host: app.example.com
      http:
        paths:
          - path: /api
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 80
          - path: /
            pathType: Prefix
            backend:
              service:
                name: frontend-service
                port:
                  number: 80

IAM Roles for Service Accounts (IRSA)

IRSA is one of the best features of EKS. Instead of attaching IAM roles to your worker nodes (which gives every pod on that node the same permissions), you bind IAM roles to specific Kubernetes service accounts:

# Create an IAM role and associate it with a Kubernetes service account
eksctl create iamserviceaccount \
  --name s3-reader \
  --namespace default \
  --cluster production \
  --attach-policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess \
  --approve

# Use the service account in your pod
apiVersion: v1
kind: Pod
metadata:
  name: s3-reader-pod
spec:
  serviceAccountName: s3-reader
  containers:
    - name: app
      image: my-app:latest
      # This container can now call S3 APIs — no access keys needed

This is the principle of least privilege done right. Your monitoring pods get CloudWatch access, your app pods get S3 access, and nothing else bleeds across.

Cluster Autoscaler and Cost Management

The Cluster Autoscaler watches for pods that can't be scheduled due to insufficient resources and adds nodes:

# Install Cluster Autoscaler
helm repo add autoscaler https://kubernetes.github.io/autoscaler
helm install cluster-autoscaler autoscaler/cluster-autoscaler \
  --namespace kube-system \
  --set autoDiscovery.clusterName=production \
  --set awsRegion=us-east-1 \
  --set rbac.serviceAccount.name=cluster-autoscaler \
  --set extraArgs.balance-similar-node-groups=true \
  --set extraArgs.skip-nodes-with-local-storage=false \
  --set extraArgs.expander=least-waste

For cost optimization, consider Karpenter — AWS's newer, faster autoscaler that provisions the right instance type based on pod requirements rather than using fixed node groups.

EKS cost breakdown for a typical production cluster:

Component	Monthly Cost
EKS Control Plane	$73
3x t3.large (on-demand)	~$180
NAT Gateway (3 AZs)	~$100
ALB	~$25 + data
EBS Volumes	~$30
Total baseline	~$408/month

Use Spot instances for non-critical workloads to cut node costs by 60-70%.

Monitoring with CloudWatch Container Insights

# Enable Container Insights with the CloudWatch agent
aws eks create-addon \
  --cluster-name production \
  --addon-name amazon-cloudwatch-observability \
  --region us-east-1

Container Insights gives you CPU/memory metrics at the cluster, node, pod, and container level. You get pre-built CloudWatch dashboards and can set alarms on pod restarts, CPU throttling, or memory pressure.

EKS vs ECS — When to Use What

Factor	EKS	ECS
Learning Curve	Steep (Kubernetes)	Moderate (AWS-native)
Portability	Multi-cloud, on-prem	AWS only
Community	Massive (CNCF ecosystem)	AWS ecosystem
Control Plane Cost	$73/month	Free
Networking	VPC CNI, Calico, etc.	awsvpc mode
Service Mesh	Istio, Linkerd, App Mesh	App Mesh, Cloud Map
Best For	Complex microservices, multi-cloud	AWS-native apps, simpler architectures

Choose EKS if you need Kubernetes portability, already have Kubernetes expertise, or require the CNCF ecosystem (Helm, Istio, Argo, etc.).

Choose ECS if you're all-in on AWS, want simpler operations, or are running straightforward container workloads.

EKS removes the hardest part of Kubernetes — keeping the control plane alive and healthy. But it doesn't remove the need to understand Kubernetes itself. Invest time in learning pod scheduling, resource requests and limits, network policies, and RBAC. The control plane being managed means you can focus on these application-level concerns instead of debugging etcd corruption at midnight. That's a trade worth making.

EKS Architecture — What AWS Actually Manages​

Creating a Cluster with eksctl​

Managed Node Groups vs Fargate Profiles​

EKS Networking — The VPC CNI Plugin​

ALB Ingress Controller​

IAM Roles for Service Accounts (IRSA)​

Cluster Autoscaler and Cost Management​

Monitoring with CloudWatch Container Insights​

EKS vs ECS — When to Use What​

Stay Updated