Kubernetes Cost Optimization — Right-Size, Spot Instances, and Karpenter

November 15, 2025 · 8 min read

DevOps & Cloud Learning Hub

Here is a number that should make you uncomfortable: the average Kubernetes cluster runs at 20-35% resource utilization. That means you are paying for three nodes but only using one node's worth of compute. Multiply that across dev, staging, and production clusters, and you are burning thousands of dollars a month on idle capacity. The good news — most of this waste is fixable with the right tools and a few YAML changes.

Where Kubernetes Costs Come From

Before optimizing, you need to understand what you are paying for:

Cost Component	% of Total	Optimization Lever
Compute (EC2/VMs)	60-75%	Right-size nodes, spot instances, autoscaling
Storage (EBS/Disks)	10-20%	Right-size PVCs, delete unused volumes
Network (NAT, LB)	5-15%	Reduce cross-AZ traffic, consolidate LBs
Control plane	2-5%	Fixed cost on managed K8s (EKS, GKE, AKS)

Compute is where the money is. That is where we will focus.

Right-Sizing Pods — Requests and Limits Analysis

The number one cause of wasted resources: developers set requests: cpu: 1, memory: 2Gi on every pod because they do not know the actual usage, and they are afraid of OOMKills. The result is massively over-provisioned clusters.

Check Actual Usage vs Requests

# See actual CPU and memory usage per pod
kubectl top pods -n production --sort-by=memory
# NAME                          CPU(cores)   MEMORY(bytes)
# payment-api-7d9b4-x2k8m      15m          128Mi
# user-service-5c8b9-rn2vl      8m           64Mi
# frontend-6f7c2-abc12          3m           48Mi

# Compare with requested resources
kubectl get pods -n production -o custom-columns=\
NAME:.metadata.name,\
CPU_REQ:.spec.containers[0].resources.requests.cpu,\
MEM_REQ:.spec.containers[0].resources.requests.memory,\
CPU_LIM:.spec.containers[0].resources.limits.cpu,\
MEM_LIM:.spec.containers[0].resources.limits.memory

# NAME                       CPU_REQ   MEM_REQ   CPU_LIM   MEM_LIM
# payment-api-7d9b4-x2k8m   500m      1Gi       1000m     2Gi      ← Using 15m/128Mi!
# user-service-5c8b9-rn2vl   500m      512Mi     1000m     1Gi
# frontend-6f7c2-abc12        250m      256Mi     500m      512Mi

The payment-api requests 500m CPU but uses 15m. That is 97% waste on a single pod.

Vertical Pod Autoscaler (VPA) for Recommendations

VPA analyzes historical usage and recommends right-sized requests:

# Install VPA
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-v1-crd-gen.yaml
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-rbac.yaml
kubectl apply -f https://github.com/kubernetes/autoscaler/releases/latest/download/vpa-v1.yaml

# Create a VPA in recommendation mode (does NOT auto-resize)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: payment-api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: payment-api
  updatePolicy:
    updateMode: "Off"      # Only recommend, do not auto-apply

# Check recommendations after a few hours of data
kubectl get vpa payment-api-vpa -n production -o yaml
# recommendation:
#   containerRecommendations:
#     - containerName: payment-api
#       lowerBound:    {cpu: 10m,  memory: 90Mi}
#       target:        {cpu: 25m,  memory: 150Mi}   ← Use this
#       upperBound:    {cpu: 80m,  memory: 300Mi}
#       uncappedTarget: {cpu: 25m, memory: 150Mi}

Apply the VPA's target recommendation as your new requests. Set limits at 2-3x the target for burst headroom.

Spot Instances for Non-Critical Workloads

Spot instances (AWS) / Preemptible VMs (GCP) / Spot VMs (Azure) cost 60-90% less than on-demand. The tradeoff: the cloud provider can reclaim them with 2 minutes notice.

Safe for spot: Stateless services, batch jobs, CI runners, dev/staging environments, workers that handle retries.

Not safe for spot: Databases, stateful services, single-replica deployments.

# EKS managed node group with spot instances
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: production
  region: us-east-1
managedNodeGroups:
  - name: spot-workers
    instanceTypes:
      - m5.large
      - m5a.large
      - m5d.large        # Multiple instance types for availability
      - m4.large
    spot: true
    minSize: 2
    maxSize: 20
    desiredCapacity: 5
    labels:
      node-type: spot
    taints:
      - key: spot
        value: "true"
        effect: NoSchedule

Use tolerations to schedule specific workloads on spot nodes:

# Deploy workers on spot instances
apiVersion: apps/v1
kind: Deployment
metadata:
  name: image-processor
spec:
  replicas: 10
  template:
    spec:
      tolerations:
        - key: spot
          operator: Equal
          value: "true"
          effect: NoSchedule
      nodeSelector:
        node-type: spot
      containers:
        - name: processor
          image: myapp/image-processor:v1
          resources:
            requests:
              cpu: 500m
              memory: 512Mi
      terminationGracePeriodSeconds: 30    # Handle spot interruption gracefully

Karpenter — Just-in-Time Node Provisioning

Karpenter is AWS's next-generation node autoscaler that replaces Cluster Autoscaler with a faster, smarter approach. Instead of scaling pre-defined node groups, Karpenter provisions the exact right instance type for your pending pods.

# Install Karpenter (EKS)
helm repo add karpenter https://charts.karpenter.sh
helm install karpenter karpenter/karpenter \
  --namespace karpenter \
  --create-namespace \
  --set clusterName=production \
  --set clusterEndpoint=$(aws eks describe-cluster --name production --query "cluster.endpoint" --output text)

# Define a NodePool (what Karpenter can provision)
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand", "spot"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["m", "c", "r"]      # General, compute, memory optimized
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["4"]                 # 5th gen or newer
      nodeClassRef:
        name: default
  limits:
    cpu: 100                            # Max 100 vCPUs total
    memory: 400Gi
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h                   # Replace nodes every 30 days

Karpenter advantages over Cluster Autoscaler:

Feature	Cluster Autoscaler	Karpenter
Scale-up speed	2-5 minutes	30-60 seconds
Instance selection	Fixed per node group	Dynamic, best-fit
Bin packing	Basic	Aggressive consolidation
Spot handling	Via mixed instance groups	Built-in, automatic fallback
Node consolidation	Manual	Automatic (replaces underutilized nodes)

Cluster Autoscaler Tuning

If you are not on AWS or cannot use Karpenter, tune the Cluster Autoscaler for better efficiency:

# Cluster Autoscaler deployment args
args:
  - --balance-similar-node-groups=true     # Spread across AZs
  - --skip-nodes-with-local-storage=false  # Scale down nodes with emptyDir
  - --scale-down-delay-after-add=5m        # Wait 5 min after scale-up before scale-down
  - --scale-down-unneeded-time=5m          # Node must be underutilized for 5 min
  - --scale-down-utilization-threshold=0.5 # Scale down if < 50% utilized
  - --max-graceful-termination-sec=600     # 10 min for pod draining
  - --expander=least-waste                 # Pick the node group with least waste

Pod Priority and Preemption

When resources are scarce, Kubernetes should evict low-priority workloads to make room for critical ones:

# Define priority classes
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: critical
value: 1000000
globalDefault: false
description: "Production-critical workloads"

---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: batch
value: 100
globalDefault: false
description: "Batch processing, can be evicted"

---
# Assign to pods
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-api
spec:
  template:
    spec:
      priorityClassName: critical
      containers:
        - name: payment-api
          image: myapp/payment-api:v2

---
apiVersion: batch/v1
kind: Job
metadata:
  name: report-generator
spec:
  template:
    spec:
      priorityClassName: batch        # Will be evicted if resources needed
      containers:
        - name: generator
          image: myapp/reports:v1

Resource Quotas Per Team

Prevent one team from consuming the entire cluster by enforcing quotas per namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-payments-quota
  namespace: team-payments
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
    pods: "50"
    persistentvolumeclaims: "10"

Cost Monitoring with OpenCost and Kubecost

You cannot optimize what you cannot measure. Install a cost monitoring tool:

# OpenCost (free, open-source)
helm repo add opencost https://opencost.github.io/opencost-helm-chart
helm install opencost opencost/opencost \
  --namespace opencost \
  --create-namespace

# Access the dashboard
kubectl port-forward svc/opencost -n opencost 9090:9090

# Kubecost (free tier available, more features)
helm repo add kubecost https://kubecost.github.io/cost-analyzer
helm install kubecost kubecost/cost-analyzer \
  --namespace kubecost \
  --create-namespace \
  --set kubecostToken="your-token"

Both tools show cost per namespace, deployment, and pod — making it easy to identify the top cost drivers and hold teams accountable.

Scale-to-Zero for Dev Environments

Dev and staging clusters running 24/7 waste money when no one is working. Use KEDA or CronJobs to scale down:

# KEDA ScaledObject — scale to zero when no HTTP traffic
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: dev-frontend
  namespace: dev
spec:
  scaleTargetRef:
    name: frontend
  minReplicaCount: 0          # Scale to zero!
  maxReplicaCount: 3
  triggers:
    - type: prometheus
      metadata:
        serverAddress: http://prometheus:9090
        metricName: http_requests_total
        query: sum(rate(http_requests_total{namespace="dev",service="frontend"}[5m]))
        threshold: "1"

Or use a simple CronJob to scale down at night:

# Scale all deployments in dev to 0 replicas at 8 PM
kubectl scale deployment --all -n dev --replicas=0

# Scale back up at 8 AM
kubectl scale deployment --all -n dev --replicas=1

Wrapping Up

Kubernetes cost optimization is not a one-time exercise — it is an ongoing discipline. Start by right-sizing pods with VPA recommendations (the biggest quick win), move non-critical workloads to spot instances, use Karpenter or a tuned Cluster Autoscaler for efficient node scaling, enforce resource quotas per team, and install OpenCost or Kubecost to make costs visible.

The combination of right-sizing and spot instances alone typically reduces compute costs by 40-60%. Add Karpenter's bin-packing and scale-to-zero for dev environments, and you can reach 60-70% savings — without sacrificing reliability.

This wraps up the Kubernetes deep-dive series. From pods and deployments to monitoring, security, GitOps, operators, service meshes, and cost optimization — you now have a solid foundation for running production Kubernetes. Keep building, keep experimenting, and keep shipping.

Where Kubernetes Costs Come From​

Right-Sizing Pods — Requests and Limits Analysis​

Check Actual Usage vs Requests​

Vertical Pod Autoscaler (VPA) for Recommendations​

Spot Instances for Non-Critical Workloads​

Karpenter — Just-in-Time Node Provisioning​

Cluster Autoscaler Tuning​

Pod Priority and Preemption​

Resource Quotas Per Team​

Cost Monitoring with OpenCost and Kubecost​

Scale-to-Zero for Dev Environments​

Wrapping Up​

Stay Updated