Skip to main content

Kubernetes Cost Optimization — Right-Size, Spot Instances, and Karpenter

· 8 min read
Goel Academy
DevOps & Cloud Learning Hub

Here is a number that should make you uncomfortable: the average Kubernetes cluster runs at 20-35% resource utilization. That means you are paying for three nodes but only using one node's worth of compute. Multiply that across dev, staging, and production clusters, and you are burning thousands of dollars a month on idle capacity. The good news — most of this waste is fixable with the right tools and a few YAML changes.

Where Kubernetes Costs Come From

Before optimizing, you need to understand what you are paying for:

Cost Component% of TotalOptimization Lever
Compute (EC2/VMs)60-75%Right-size nodes, spot instances, autoscaling
Storage (EBS/Disks)10-20%Right-size PVCs, delete unused volumes
Network (NAT, LB)5-15%Reduce cross-AZ traffic, consolidate LBs
Control plane2-5%Fixed cost on managed K8s (EKS, GKE, AKS)

Compute is where the money is. That is where we will focus.

Right-Sizing Pods — Requests and Limits Analysis

The number one cause of wasted resources: developers set requests: cpu: 1, memory: 2Gi on every pod because they do not know the actual usage, and they are afraid of OOMKills. The result is massively over-provisioned clusters.

Check Actual Usage vs Requests

# See actual CPU and memory usage per pod
kubectl top pods -n production --sort-by=memory
# NAME CPU(cores) MEMORY(bytes)
# payment-api-7d9b4-x2k8m 15m 128Mi
# user-service-5c8b9-rn2vl 8m 64Mi
# frontend-6f7c2-abc12 3m 48Mi

# Compare with requested resources
kubectl get pods -n production -o custom-columns=\
NAME:.metadata.name,\
CPU_REQ:.spec.containers[0].resources.requests.cpu,\
MEM_REQ:.spec.containers[0].resources.requests.memory,\
CPU_LIM:.spec.containers[0].resources.limits.cpu,\
MEM_LIM:.spec.containers[0].resources.limits.memory

# NAME CPU_REQ MEM_REQ CPU_LIM MEM_LIM
# payment-api-7d9b4-x2k8m 500m 1Gi 1000m 2Gi ← Using 15m/128Mi!
# user-service-5c8b9-rn2vl 500m 512Mi 1000m 1Gi
# frontend-6f7c2-abc12 250m 256Mi 500m 512Mi

The payment-api requests 500m CPU but uses 15m. That is 97% waste on a single pod.

Vertical Pod Autoscaler (VPA) for Recommendations

VPA analyzes historical usage and recommends right-sized requests:

# Install VPA
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-v1-crd-gen.yaml
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-rbac.yaml
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-v1.yaml
# Create a VPA in recommendation mode (does NOT auto-resize)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: payment-api-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: payment-api
updatePolicy:
updateMode: "Off" # Only recommend, do not auto-apply
# Check recommendations after a few hours of data
kubectl get vpa payment-api-vpa -n production -o yaml
# recommendation:
# containerRecommendations:
# - containerName: payment-api
# lowerBound: {cpu: 10m, memory: 90Mi}
# target: {cpu: 25m, memory: 150Mi} ← Use this
# upperBound: {cpu: 80m, memory: 300Mi}
# uncappedTarget: {cpu: 25m, memory: 150Mi}

Apply the VPA's target recommendation as your new requests. Set limits at 2-3x the target for burst headroom.

Spot Instances for Non-Critical Workloads

Spot instances (AWS) / Preemptible VMs (GCP) / Spot VMs (Azure) cost 60-90% less than on-demand. The tradeoff: the cloud provider can reclaim them with 2 minutes notice.

Safe for spot: Stateless services, batch jobs, CI runners, dev/staging environments, workers that handle retries.

Not safe for spot: Databases, stateful services, single-replica deployments.

# EKS managed node group with spot instances
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: production
region: us-east-1
managedNodeGroups:
- name: spot-workers
instanceTypes:
- m5.large
- m5a.large
- m5d.large # Multiple instance types for availability
- m4.large
spot: true
minSize: 2
maxSize: 20
desiredCapacity: 5
labels:
node-type: spot
taints:
- key: spot
value: "true"
effect: NoSchedule

Use tolerations to schedule specific workloads on spot nodes:

# Deploy workers on spot instances
apiVersion: apps/v1
kind: Deployment
metadata:
name: image-processor
spec:
replicas: 10
template:
spec:
tolerations:
- key: spot
operator: Equal
value: "true"
effect: NoSchedule
nodeSelector:
node-type: spot
containers:
- name: processor
image: myapp/image-processor:v1
resources:
requests:
cpu: 500m
memory: 512Mi
terminationGracePeriodSeconds: 30 # Handle spot interruption gracefully

Karpenter — Just-in-Time Node Provisioning

Karpenter is AWS's next-generation node autoscaler that replaces Cluster Autoscaler with a faster, smarter approach. Instead of scaling pre-defined node groups, Karpenter provisions the exact right instance type for your pending pods.

# Install Karpenter (EKS)
helm repo add karpenter https://charts.karpenter.sh
helm install karpenter karpenter/karpenter \
--namespace karpenter \
--create-namespace \
--set clusterName=production \
--set clusterEndpoint=$(aws eks describe-cluster --name production --query "cluster.endpoint" --output text)
# Define a NodePool (what Karpenter can provision)
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand", "spot"]
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: karpenter.k8s.aws/instance-category
operator: In
values: ["m", "c", "r"] # General, compute, memory optimized
- key: karpenter.k8s.aws/instance-generation
operator: Gt
values: ["4"] # 5th gen or newer
nodeClassRef:
name: default
limits:
cpu: 100 # Max 100 vCPUs total
memory: 400Gi
disruption:
consolidationPolicy: WhenUnderutilized
expireAfter: 720h # Replace nodes every 30 days

Karpenter advantages over Cluster Autoscaler:

FeatureCluster AutoscalerKarpenter
Scale-up speed2-5 minutes30-60 seconds
Instance selectionFixed per node groupDynamic, best-fit
Bin packingBasicAggressive consolidation
Spot handlingVia mixed instance groupsBuilt-in, automatic fallback
Node consolidationManualAutomatic (replaces underutilized nodes)

Cluster Autoscaler Tuning

If you are not on AWS or cannot use Karpenter, tune the Cluster Autoscaler for better efficiency:

# Cluster Autoscaler deployment args
args:
- --balance-similar-node-groups=true # Spread across AZs
- --skip-nodes-with-local-storage=false # Scale down nodes with emptyDir
- --scale-down-delay-after-add=5m # Wait 5 min after scale-up before scale-down
- --scale-down-unneeded-time=5m # Node must be underutilized for 5 min
- --scale-down-utilization-threshold=0.5 # Scale down if < 50% utilized
- --max-graceful-termination-sec=600 # 10 min for pod draining
- --expander=least-waste # Pick the node group with least waste

Pod Priority and Preemption

When resources are scarce, Kubernetes should evict low-priority workloads to make room for critical ones:

# Define priority classes
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: critical
value: 1000000
globalDefault: false
description: "Production-critical workloads"

---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: batch
value: 100
globalDefault: false
description: "Batch processing, can be evicted"

---
# Assign to pods
apiVersion: apps/v1
kind: Deployment
metadata:
name: payment-api
spec:
template:
spec:
priorityClassName: critical
containers:
- name: payment-api
image: myapp/payment-api:v2

---
apiVersion: batch/v1
kind: Job
metadata:
name: report-generator
spec:
template:
spec:
priorityClassName: batch # Will be evicted if resources needed
containers:
- name: generator
image: myapp/reports:v1

Resource Quotas Per Team

Prevent one team from consuming the entire cluster by enforcing quotas per namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
name: team-payments-quota
namespace: team-payments
spec:
hard:
requests.cpu: "20"
requests.memory: 40Gi
limits.cpu: "40"
limits.memory: 80Gi
pods: "50"
persistentvolumeclaims: "10"

Cost Monitoring with OpenCost and Kubecost

You cannot optimize what you cannot measure. Install a cost monitoring tool:

# OpenCost (free, open-source)
helm repo add opencost https://opencost.github.io/opencost-helm-chart
helm install opencost opencost/opencost \
--namespace opencost \
--create-namespace

# Access the dashboard
kubectl port-forward svc/opencost -n opencost 9090:9090
# Kubecost (free tier available, more features)
helm repo add kubecost https://kubecost.github.io/cost-analyzer
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--create-namespace \
--set kubecostToken="your-token"

Both tools show cost per namespace, deployment, and pod — making it easy to identify the top cost drivers and hold teams accountable.

Scale-to-Zero for Dev Environments

Dev and staging clusters running 24/7 waste money when no one is working. Use KEDA or CronJobs to scale down:

# KEDA ScaledObject — scale to zero when no HTTP traffic
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: dev-frontend
namespace: dev
spec:
scaleTargetRef:
name: frontend
minReplicaCount: 0 # Scale to zero!
maxReplicaCount: 3
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
metricName: http_requests_total
query: sum(rate(http_requests_total{namespace="dev",service="frontend"}[5m]))
threshold: "1"

Or use a simple CronJob to scale down at night:

# Scale all deployments in dev to 0 replicas at 8 PM
kubectl scale deployment --all -n dev --replicas=0

# Scale back up at 8 AM
kubectl scale deployment --all -n dev --replicas=1

Wrapping Up

Kubernetes cost optimization is not a one-time exercise — it is an ongoing discipline. Start by right-sizing pods with VPA recommendations (the biggest quick win), move non-critical workloads to spot instances, use Karpenter or a tuned Cluster Autoscaler for efficient node scaling, enforce resource quotas per team, and install OpenCost or Kubecost to make costs visible.

The combination of right-sizing and spot instances alone typically reduces compute costs by 40-60%. Add Karpenter's bin-packing and scale-to-zero for dev environments, and you can reach 60-70% savings — without sacrificing reliability.

This wraps up the Kubernetes deep-dive series. From pods and deployments to monitoring, security, GitOps, operators, service meshes, and cost optimization — you now have a solid foundation for running production Kubernetes. Keep building, keep experimenting, and keep shipping.