Kubernetes Resource Management — Requests, Limits, and QoS Classes

July 12, 2025 · 7 min read

DevOps & Cloud Learning Hub

Your pod gets killed with OOMKilled and you have no idea why. Or your app crawls because Kubernetes is throttling its CPU to a fraction of what it needs. Resource management is one of the most misunderstood areas in Kubernetes, and getting it wrong means wasted money, poor performance, or unexpected crashes.

CPU vs Memory — Two Different Beasts

Kubernetes manages two primary resource types, and they behave very differently:

Resource	Type	What Happens When Exceeded	Unit
CPU	Compressible	Container is throttled — slowed down, not killed	Millicores (1000m = 1 vCPU)
Memory	Incompressible	Container is OOMKilled — terminated immediately	Bytes (Mi, Gi)

CPU is forgiving. If your container tries to use more CPU than its limit, it gets throttled but keeps running. Memory is not forgiving. If your container exceeds its memory limit, the kernel's OOM killer terminates it instantly.

apiVersion: v1
kind: Pod
metadata:
  name: resource-demo
spec:
  containers:
  - name: app
    image: my-app:1.0
    resources:
      requests:
        cpu: "250m"        # 0.25 vCPU — used for scheduling
        memory: "256Mi"    # 256 MiB — used for scheduling
      limits:
        cpu: "500m"        # 0.5 vCPU — hard ceiling (throttled beyond this)
        memory: "512Mi"    # 512 MiB — hard ceiling (OOMKilled beyond this)

Requests vs Limits

These two settings control different things:

Setting	Purpose	Scheduling	Enforcement
Requests	Minimum guaranteed resources	Yes — scheduler uses this to find a node	Soft — container can use more if available
Limits	Maximum allowed resources	No — not used for scheduling	Hard — CPU throttled, memory OOMKilled

# See resource requests and limits for all pods
kubectl get pods -o custom-columns=\
NAME:.metadata.name,\
CPU_REQ:.spec.containers[0].resources.requests.cpu,\
CPU_LIM:.spec.containers[0].resources.limits.cpu,\
MEM_REQ:.spec.containers[0].resources.requests.memory,\
MEM_LIM:.spec.containers[0].resources.limits.memory

# Check actual resource usage vs requests
kubectl top pods
# NAME          CPU(cores)   MEMORY(bytes)
# web-app       45m          128Mi
# api-server    120m         256Mi

The scheduler places pods on nodes based on requests, not limits. If a node has 2 CPU cores and 4Gi of memory available, and your pod requests 500m CPU and 1Gi memory, the scheduler counts that capacity as consumed — even if the pod only uses 100m CPU and 200Mi memory.

QoS Classes

Kubernetes assigns every pod a Quality of Service class based on how you configure requests and limits. The QoS class determines which pods get killed first when a node runs out of memory.

QoS Class	Condition	Eviction Priority	When to Use
Guaranteed	Requests = Limits for all containers	Last to be evicted	Critical workloads (databases, payment services)
Burstable	Requests < Limits (or only requests set)	Evicted after BestEffort	Most application workloads
BestEffort	No requests or limits set	First to be evicted	Batch jobs, non-critical tasks

# Guaranteed QoS — requests equal limits
resources:
  requests:
    cpu: "500m"
    memory: "512Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

# Burstable QoS — requests less than limits
resources:
  requests:
    cpu: "250m"
    memory: "256Mi"
  limits:
    cpu: "1000m"
    memory: "1Gi"

# BestEffort QoS — no resources specified at all
# (just omit the resources block entirely)

# Check a pod's QoS class
kubectl get pod my-app -o jsonpath='{.status.qosClass}'
# Burstable

OOMKilled — What Happens and Why

When a container exceeds its memory limit, the Linux kernel's OOM killer terminates it:

# Spot OOMKilled pods
kubectl get pods
# NAME       READY   STATUS      RESTARTS   AGE
# my-app     0/1     OOMKilled   5          10m

# Get details
kubectl describe pod my-app | grep -A 5 "Last State"
# Last State:  Terminated
#   Reason:    OOMKilled
#   Exit Code: 137

# Check memory usage before the kill
kubectl top pod my-app

Common causes of OOMKilled: memory leaks in the application, JVM heap set larger than the container limit, loading large files into memory, or simply underestimating memory requirements.

CPU Throttling

When a container hits its CPU limit, it is not killed — it is throttled. The kernel CFS (Completely Fair Scheduler) restricts the container's CPU time, causing latency spikes.

# Check for CPU throttling (from inside the container or node)
cat /sys/fs/cgroup/cpu/cpu.stat
# nr_periods 50000
# nr_throttled 12000    # <-- 24% of periods were throttled
# throttled_time 3400000000

# Or check with kubectl
kubectl top pods
# NAME       CPU(cores)   MEMORY(bytes)
# my-app     500m         256Mi          # Pinned at the limit = throttled

A common pattern is to set CPU requests but remove CPU limits entirely, letting pods burst when the node has spare capacity. This avoids throttling without affecting scheduling.

LimitRange — Default Limits Per Namespace

A LimitRange sets default requests and limits for containers that do not specify their own. This prevents developers from deploying pods with no resource constraints.

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
  - type: Container
    default:            # Default limits (if none specified)
      cpu: "500m"
      memory: "512Mi"
    defaultRequest:     # Default requests (if none specified)
      cpu: "100m"
      memory: "128Mi"
    max:                # Maximum allowed
      cpu: "4"
      memory: "8Gi"
    min:                # Minimum allowed
      cpu: "50m"
      memory: "64Mi"

# Apply and verify
kubectl apply -f limitrange.yaml
kubectl describe limitrange default-limits -n production

# Now deploy a pod with no resources — it gets the defaults
kubectl run test --image=nginx -n production
kubectl get pod test -n production -o jsonpath='{.spec.containers[0].resources}'
# {"limits":{"cpu":"500m","memory":"512Mi"},"requests":{"cpu":"100m","memory":"128Mi"}}

ResourceQuota — Namespace-Level Caps

While LimitRange controls individual containers, ResourceQuota caps total resource consumption for an entire namespace.

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "10"          # Total CPU requests across all pods
    requests.memory: "20Gi"     # Total memory requests
    limits.cpu: "20"            # Total CPU limits
    limits.memory: "40Gi"       # Total memory limits
    pods: "50"                  # Max number of pods
    persistentvolumeclaims: "10"
    services.loadbalancers: "2"

# Check quota usage
kubectl get resourcequota team-quota -n team-alpha
# NAME         AGE   REQUEST                                          LIMIT
# team-quota   5d    requests.cpu: 3200m/10, requests.memory: 8Gi/20Gi   limits.cpu: 6/20, limits.memory: 16Gi/40Gi

# Detailed view
kubectl describe resourcequota team-quota -n team-alpha

When a ResourceQuota is active, every pod in the namespace must specify resource requests and limits — otherwise the pod is rejected. Combine with LimitRange to set sane defaults.

Right-Sizing with kubectl top and Metrics Server

Setting the right requests and limits requires observing actual usage over time:

# Install metrics-server (if not already installed)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# View current resource usage per pod
kubectl top pods -n production
# NAME          CPU(cores)   MEMORY(bytes)
# api-server    85m          210Mi
# web-app       12m          95Mi
# worker        320m         1.2Gi

# View usage per node
kubectl top nodes
# NAME      CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
# node-1    1200m        60%    6.2Gi           77%
# node-2    800m         40%    4.8Gi           60%

# View resource requests vs capacity per node
kubectl describe node node-1 | grep -A 10 "Allocated resources"

Vertical Pod Autoscaler (VPA)

VPA observes actual resource usage and recommends (or automatically sets) the right requests and limits.

VPA operates in three modes:

Mode	Behavior	Use Case
Off	Only generates recommendations, no changes	Start here — observe before acting
Initial	Sets resources only at pod creation time	Avoid live disruptions
Auto	Updates running pods (evicts and recreates)	Full automation

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"         # Start with Off to observe recommendations
  resourcePolicy:
    containerPolicies:
    - containerName: api
      minAllowed:
        cpu: "100m"
        memory: "128Mi"
      maxAllowed:
        cpu: "4"
        memory: "8Gi"

# Check VPA recommendations
kubectl get vpa api-vpa -n production -o yaml | grep -A 20 recommendation
# recommendation:
#   containerRecommendations:
#   - containerName: api
#     lowerBound:
#       cpu: 120m, memory: 200Mi
#     target:
#       cpu: 250m, memory: 384Mi     # <-- Set your requests to this
#     upperBound:
#       cpu: 800m, memory: 1.2Gi     # <-- Set your limits near this

The general workflow: deploy with rough estimates, run VPA in Off mode for a week, read the recommendations, update your manifests, then optionally switch to Auto for ongoing adjustments.

Next, we will explore Kubernetes workload types beyond Deployments — Jobs for batch processing, CronJobs for scheduling, DaemonSets for node-level agents, and StatefulSets for stateful applications.

CPU vs Memory — Two Different Beasts​

Requests vs Limits​

QoS Classes​

OOMKilled — What Happens and Why​

CPU Throttling​

LimitRange — Default Limits Per Namespace​

ResourceQuota — Namespace-Level Caps​

Right-Sizing with kubectl top and Metrics Server​

Vertical Pod Autoscaler (VPA)​

Stay Updated