Skip to main content

Kubernetes Resource Management — Requests, Limits, and QoS Classes

· 7 min read
Goel Academy
DevOps & Cloud Learning Hub

Your pod gets killed with OOMKilled and you have no idea why. Or your app crawls because Kubernetes is throttling its CPU to a fraction of what it needs. Resource management is one of the most misunderstood areas in Kubernetes, and getting it wrong means wasted money, poor performance, or unexpected crashes.

CPU vs Memory — Two Different Beasts

Kubernetes manages two primary resource types, and they behave very differently:

ResourceTypeWhat Happens When ExceededUnit
CPUCompressibleContainer is throttled — slowed down, not killedMillicores (1000m = 1 vCPU)
MemoryIncompressibleContainer is OOMKilled — terminated immediatelyBytes (Mi, Gi)

CPU is forgiving. If your container tries to use more CPU than its limit, it gets throttled but keeps running. Memory is not forgiving. If your container exceeds its memory limit, the kernel's OOM killer terminates it instantly.

apiVersion: v1
kind: Pod
metadata:
name: resource-demo
spec:
containers:
- name: app
image: my-app:1.0
resources:
requests:
cpu: "250m" # 0.25 vCPU — used for scheduling
memory: "256Mi" # 256 MiB — used for scheduling
limits:
cpu: "500m" # 0.5 vCPU — hard ceiling (throttled beyond this)
memory: "512Mi" # 512 MiB — hard ceiling (OOMKilled beyond this)

Requests vs Limits

These two settings control different things:

SettingPurposeSchedulingEnforcement
RequestsMinimum guaranteed resourcesYes — scheduler uses this to find a nodeSoft — container can use more if available
LimitsMaximum allowed resourcesNo — not used for schedulingHard — CPU throttled, memory OOMKilled
# See resource requests and limits for all pods
kubectl get pods -o custom-columns=\
NAME:.metadata.name,\
CPU_REQ:.spec.containers[0].resources.requests.cpu,\
CPU_LIM:.spec.containers[0].resources.limits.cpu,\
MEM_REQ:.spec.containers[0].resources.requests.memory,\
MEM_LIM:.spec.containers[0].resources.limits.memory

# Check actual resource usage vs requests
kubectl top pods
# NAME CPU(cores) MEMORY(bytes)
# web-app 45m 128Mi
# api-server 120m 256Mi

The scheduler places pods on nodes based on requests, not limits. If a node has 2 CPU cores and 4Gi of memory available, and your pod requests 500m CPU and 1Gi memory, the scheduler counts that capacity as consumed — even if the pod only uses 100m CPU and 200Mi memory.

QoS Classes

Kubernetes assigns every pod a Quality of Service class based on how you configure requests and limits. The QoS class determines which pods get killed first when a node runs out of memory.

QoS ClassConditionEviction PriorityWhen to Use
GuaranteedRequests = Limits for all containersLast to be evictedCritical workloads (databases, payment services)
BurstableRequests < Limits (or only requests set)Evicted after BestEffortMost application workloads
BestEffortNo requests or limits setFirst to be evictedBatch jobs, non-critical tasks
# Guaranteed QoS — requests equal limits
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "512Mi"
# Burstable QoS — requests less than limits
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "1Gi"
# BestEffort QoS — no resources specified at all
# (just omit the resources block entirely)
# Check a pod's QoS class
kubectl get pod my-app -o jsonpath='{.status.qosClass}'
# Burstable

OOMKilled — What Happens and Why

When a container exceeds its memory limit, the Linux kernel's OOM killer terminates it:

# Spot OOMKilled pods
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# my-app 0/1 OOMKilled 5 10m

# Get details
kubectl describe pod my-app | grep -A 5 "Last State"
# Last State: Terminated
# Reason: OOMKilled
# Exit Code: 137

# Check memory usage before the kill
kubectl top pod my-app

Common causes of OOMKilled: memory leaks in the application, JVM heap set larger than the container limit, loading large files into memory, or simply underestimating memory requirements.

CPU Throttling

When a container hits its CPU limit, it is not killed — it is throttled. The kernel CFS (Completely Fair Scheduler) restricts the container's CPU time, causing latency spikes.

# Check for CPU throttling (from inside the container or node)
cat /sys/fs/cgroup/cpu/cpu.stat
# nr_periods 50000
# nr_throttled 12000 # <-- 24% of periods were throttled
# throttled_time 3400000000

# Or check with kubectl
kubectl top pods
# NAME CPU(cores) MEMORY(bytes)
# my-app 500m 256Mi # Pinned at the limit = throttled

A common pattern is to set CPU requests but remove CPU limits entirely, letting pods burst when the node has spare capacity. This avoids throttling without affecting scheduling.

LimitRange — Default Limits Per Namespace

A LimitRange sets default requests and limits for containers that do not specify their own. This prevents developers from deploying pods with no resource constraints.

apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: production
spec:
limits:
- type: Container
default: # Default limits (if none specified)
cpu: "500m"
memory: "512Mi"
defaultRequest: # Default requests (if none specified)
cpu: "100m"
memory: "128Mi"
max: # Maximum allowed
cpu: "4"
memory: "8Gi"
min: # Minimum allowed
cpu: "50m"
memory: "64Mi"
# Apply and verify
kubectl apply -f limitrange.yaml
kubectl describe limitrange default-limits -n production

# Now deploy a pod with no resources — it gets the defaults
kubectl run test --image=nginx -n production
kubectl get pod test -n production -o jsonpath='{.spec.containers[0].resources}'
# {"limits":{"cpu":"500m","memory":"512Mi"},"requests":{"cpu":"100m","memory":"128Mi"}}

ResourceQuota — Namespace-Level Caps

While LimitRange controls individual containers, ResourceQuota caps total resource consumption for an entire namespace.

apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: team-alpha
spec:
hard:
requests.cpu: "10" # Total CPU requests across all pods
requests.memory: "20Gi" # Total memory requests
limits.cpu: "20" # Total CPU limits
limits.memory: "40Gi" # Total memory limits
pods: "50" # Max number of pods
persistentvolumeclaims: "10"
services.loadbalancers: "2"
# Check quota usage
kubectl get resourcequota team-quota -n team-alpha
# NAME AGE REQUEST LIMIT
# team-quota 5d requests.cpu: 3200m/10, requests.memory: 8Gi/20Gi limits.cpu: 6/20, limits.memory: 16Gi/40Gi

# Detailed view
kubectl describe resourcequota team-quota -n team-alpha

When a ResourceQuota is active, every pod in the namespace must specify resource requests and limits — otherwise the pod is rejected. Combine with LimitRange to set sane defaults.

Right-Sizing with kubectl top and Metrics Server

Setting the right requests and limits requires observing actual usage over time:

# Install metrics-server (if not already installed)
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

# View current resource usage per pod
kubectl top pods -n production
# NAME CPU(cores) MEMORY(bytes)
# api-server 85m 210Mi
# web-app 12m 95Mi
# worker 320m 1.2Gi

# View usage per node
kubectl top nodes
# NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
# node-1 1200m 60% 6.2Gi 77%
# node-2 800m 40% 4.8Gi 60%

# View resource requests vs capacity per node
kubectl describe node node-1 | grep -A 10 "Allocated resources"

Vertical Pod Autoscaler (VPA)

VPA observes actual resource usage and recommends (or automatically sets) the right requests and limits.

VPA operates in three modes:

ModeBehaviorUse Case
OffOnly generates recommendations, no changesStart here — observe before acting
InitialSets resources only at pod creation timeAvoid live disruptions
AutoUpdates running pods (evicts and recreates)Full automation
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Off" # Start with Off to observe recommendations
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "4"
memory: "8Gi"
# Check VPA recommendations
kubectl get vpa api-vpa -n production -o yaml | grep -A 20 recommendation
# recommendation:
# containerRecommendations:
# - containerName: api
# lowerBound:
# cpu: 120m, memory: 200Mi
# target:
# cpu: 250m, memory: 384Mi # <-- Set your requests to this
# upperBound:
# cpu: 800m, memory: 1.2Gi # <-- Set your limits near this

The general workflow: deploy with rough estimates, run VPA in Off mode for a week, read the recommendations, update your manifests, then optionally switch to Auto for ongoing adjustments.


Next, we will explore Kubernetes workload types beyond Deployments — Jobs for batch processing, CronJobs for scheduling, DaemonSets for node-level agents, and StatefulSets for stateful applications.