Kubernetes Resource Management — Requests, Limits, and QoS Classes
Your pod gets killed with OOMKilled and you have no idea why. Or your app crawls because Kubernetes is throttling its CPU to a fraction of what it needs. Resource management is one of the most misunderstood areas in Kubernetes, and getting it wrong means wasted money, poor performance, or unexpected crashes.
CPU vs Memory — Two Different Beasts
Kubernetes manages two primary resource types, and they behave very differently:
| Resource | Type | What Happens When Exceeded | Unit |
|---|---|---|---|
| CPU | Compressible | Container is throttled — slowed down, not killed | Millicores (1000m = 1 vCPU) |
| Memory | Incompressible | Container is OOMKilled — terminated immediately | Bytes (Mi, Gi) |
CPU is forgiving. If your container tries to use more CPU than its limit, it gets throttled but keeps running. Memory is not forgiving. If your container exceeds its memory limit, the kernel's OOM killer terminates it instantly.
apiVersion: v1
kind: Pod
metadata:
name: resource-demo
spec:
containers:
- name: app
image: my-app:1.0
resources:
requests:
cpu: "250m" # 0.25 vCPU — used for scheduling
memory: "256Mi" # 256 MiB — used for scheduling
limits:
cpu: "500m" # 0.5 vCPU — hard ceiling (throttled beyond this)
memory: "512Mi" # 512 MiB — hard ceiling (OOMKilled beyond this)
Requests vs Limits
These two settings control different things:
| Setting | Purpose | Scheduling | Enforcement |
|---|---|---|---|
| Requests | Minimum guaranteed resources | Yes — scheduler uses this to find a node | Soft — container can use more if available |
| Limits | Maximum allowed resources | No — not used for scheduling | Hard — CPU throttled, memory OOMKilled |
# See resource requests and limits for all pods
kubectl get pods -o custom-columns=\
NAME:.metadata.name,\
CPU_REQ:.spec.containers[0].resources.requests.cpu,\
CPU_LIM:.spec.containers[0].resources.limits.cpu,\
MEM_REQ:.spec.containers[0].resources.requests.memory,\
MEM_LIM:.spec.containers[0].resources.limits.memory
# Check actual resource usage vs requests
kubectl top pods
# NAME CPU(cores) MEMORY(bytes)
# web-app 45m 128Mi
# api-server 120m 256Mi
The scheduler places pods on nodes based on requests, not limits. If a node has 2 CPU cores and 4Gi of memory available, and your pod requests 500m CPU and 1Gi memory, the scheduler counts that capacity as consumed — even if the pod only uses 100m CPU and 200Mi memory.
QoS Classes
Kubernetes assigns every pod a Quality of Service class based on how you configure requests and limits. The QoS class determines which pods get killed first when a node runs out of memory.
| QoS Class | Condition | Eviction Priority | When to Use |
|---|---|---|---|
| Guaranteed | Requests = Limits for all containers | Last to be evicted | Critical workloads (databases, payment services) |
| Burstable | Requests < Limits (or only requests set) | Evicted after BestEffort | Most application workloads |
| BestEffort | No requests or limits set | First to be evicted | Batch jobs, non-critical tasks |
# Guaranteed QoS — requests equal limits
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "512Mi"
# Burstable QoS — requests less than limits
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "1Gi"
# BestEffort QoS — no resources specified at all
# (just omit the resources block entirely)
# Check a pod's QoS class
kubectl get pod my-app -o jsonpath='{.status.qosClass}'
# Burstable
OOMKilled — What Happens and Why
When a container exceeds its memory limit, the Linux kernel's OOM killer terminates it:
# Spot OOMKilled pods
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# my-app 0/1 OOMKilled 5 10m
# Get details
kubectl describe pod my-app | grep -A 5 "Last State"
# Last State: Terminated
# Reason: OOMKilled
# Exit Code: 137
# Check memory usage before the kill
kubectl top pod my-app
Common causes of OOMKilled: memory leaks in the application, JVM heap set larger than the container limit, loading large files into memory, or simply underestimating memory requirements.
CPU Throttling
When a container hits its CPU limit, it is not killed — it is throttled. The kernel CFS (Completely Fair Scheduler) restricts the container's CPU time, causing latency spikes.
# Check for CPU throttling (from inside the container or node)
cat /sys/fs/cgroup/cpu/cpu.stat
# nr_periods 50000
# nr_throttled 12000 # <-- 24% of periods were throttled
# throttled_time 3400000000
# Or check with kubectl
kubectl top pods
# NAME CPU(cores) MEMORY(bytes)
# my-app 500m 256Mi # Pinned at the limit = throttled
A common pattern is to set CPU requests but remove CPU limits entirely, letting pods burst when the node has spare capacity. This avoids throttling without affecting scheduling.
LimitRange — Default Limits Per Namespace
A LimitRange sets default requests and limits for containers that do not specify their own. This prevents developers from deploying pods with no resource constraints.
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: production
spec:
limits:
- type: Container
default: # Default limits (if none specified)
cpu: "500m"
memory: "512Mi"
defaultRequest: # Default requests (if none specified)
cpu: "100m"
memory: "128Mi"
max: # Maximum allowed
cpu: "4"
memory: "8Gi"
min: # Minimum allowed
cpu: "50m"
memory: "64Mi"
# Apply and verify
kubectl apply -f limitrange.yaml
kubectl describe limitrange default-limits -n production
# Now deploy a pod with no resources — it gets the defaults
kubectl run test --image=nginx -n production
kubectl get pod test -n production -o jsonpath='{.spec.containers[0].resources}'
# {"limits":{"cpu":"500m","memory":"512Mi"},"requests":{"cpu":"100m","memory":"128Mi"}}
ResourceQuota — Namespace-Level Caps
While LimitRange controls individual containers, ResourceQuota caps total resource consumption for an entire namespace.
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: team-alpha
spec:
hard:
requests.cpu: "10" # Total CPU requests across all pods
requests.memory: "20Gi" # Total memory requests
limits.cpu: "20" # Total CPU limits
limits.memory: "40Gi" # Total memory limits
pods: "50" # Max number of pods
persistentvolumeclaims: "10"
services.loadbalancers: "2"
# Check quota usage
kubectl get resourcequota team-quota -n team-alpha
# NAME AGE REQUEST LIMIT
# team-quota 5d requests.cpu: 3200m/10, requests.memory: 8Gi/20Gi limits.cpu: 6/20, limits.memory: 16Gi/40Gi
# Detailed view
kubectl describe resourcequota team-quota -n team-alpha
When a ResourceQuota is active, every pod in the namespace must specify resource requests and limits — otherwise the pod is rejected. Combine with LimitRange to set sane defaults.
Right-Sizing with kubectl top and Metrics Server
Setting the right requests and limits requires observing actual usage over time:
# Install metrics-server (if not already installed)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# View current resource usage per pod
kubectl top pods -n production
# NAME CPU(cores) MEMORY(bytes)
# api-server 85m 210Mi
# web-app 12m 95Mi
# worker 320m 1.2Gi
# View usage per node
kubectl top nodes
# NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
# node-1 1200m 60% 6.2Gi 77%
# node-2 800m 40% 4.8Gi 60%
# View resource requests vs capacity per node
kubectl describe node node-1 | grep -A 10 "Allocated resources"
Vertical Pod Autoscaler (VPA)
VPA observes actual resource usage and recommends (or automatically sets) the right requests and limits.
VPA operates in three modes:
| Mode | Behavior | Use Case |
|---|---|---|
| Off | Only generates recommendations, no changes | Start here — observe before acting |
| Initial | Sets resources only at pod creation time | Avoid live disruptions |
| Auto | Updates running pods (evicts and recreates) | Full automation |
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Off" # Start with Off to observe recommendations
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "4"
memory: "8Gi"
# Check VPA recommendations
kubectl get vpa api-vpa -n production -o yaml | grep -A 20 recommendation
# recommendation:
# containerRecommendations:
# - containerName: api
# lowerBound:
# cpu: 120m, memory: 200Mi
# target:
# cpu: 250m, memory: 384Mi # <-- Set your requests to this
# upperBound:
# cpu: 800m, memory: 1.2Gi # <-- Set your limits near this
The general workflow: deploy with rough estimates, run VPA in Off mode for a week, read the recommendations, update your manifests, then optionally switch to Auto for ongoing adjustments.
Next, we will explore Kubernetes workload types beyond Deployments — Jobs for batch processing, CronJobs for scheduling, DaemonSets for node-level agents, and StatefulSets for stateful applications.
