Kubernetes Health Probes — Liveness, Readiness, and Startup Explained
Your pod shows Running status, but the app inside crashed five minutes ago. Users get 502 errors while Kubernetes happily reports everything is fine. Without health probes, Kubernetes has no idea whether your application is actually working — it only knows the process is alive.
Why Health Probes Matter
By default, Kubernetes considers a container "healthy" as long as the main process (PID 1) is running. But a running process does not mean a functioning application. Your web server could be stuck in a deadlock, your database connection pool exhausted, or your app caught in an infinite loop. Health probes give Kubernetes eyes into your application's actual state.
# Without probes: pod shows Running even though the app is broken
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# my-app 1/1 Running 0 45m
# ^ Looks healthy, but returning 500 errors to every request
Kubernetes provides three types of probes, each serving a different purpose in the pod lifecycle.
Liveness Probe — Restart Unhealthy Containers
The liveness probe answers one question: is this container still working? If the liveness probe fails, Kubernetes kills the container and restarts it according to the pod's restartPolicy.
Use liveness probes for detecting deadlocks, infinite loops, or corrupted state that only a restart can fix.
apiVersion: v1
kind: Pod
metadata:
name: web-app
spec:
containers:
- name: app
image: my-app:1.0
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
failureThreshold: 3
timeoutSeconds: 5
# When liveness fails, you see restarts incrementing
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# web-app 1/1 Running 3 10m
# Check why it restarted
kubectl describe pod web-app | grep -A 10 "Last State"
# Last State: Terminated
# Reason: Liveness probe failed
# Exit Code: 137
Readiness Probe — Control Traffic Routing
The readiness probe answers: is this container ready to accept traffic? If the readiness probe fails, Kubernetes removes the pod from all Service endpoints. The container is not restarted — it just stops receiving traffic until the probe passes again.
Use readiness probes for warm-up periods, dependency checks, and temporary overload situations.
apiVersion: v1
kind: Pod
metadata:
name: api-server
spec:
containers:
- name: api
image: my-api:2.0
ports:
- containerPort: 3000
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
successThreshold: 1
timeoutSeconds: 3
# When readiness fails, READY shows 0/1 — no traffic routed
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# api-server 0/1 Running 0 2m
# Endpoints are removed from the Service
kubectl get endpoints api-service
# NAME ENDPOINTS
# api-service <none> # No backends — traffic goes nowhere
Startup Probe — Slow-Starting Containers
The startup probe protects slow-starting containers. While the startup probe is running, liveness and readiness probes are disabled. Once the startup probe succeeds, Kubernetes hands control to the other probes.
This is essential for Java applications, legacy apps, or anything that needs more than 30 seconds to initialize.
apiVersion: v1
kind: Pod
metadata:
name: legacy-app
spec:
containers:
- name: app
image: legacy-java-app:1.0
ports:
- containerPort: 8080
startupProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 0
periodSeconds: 10
failureThreshold: 30 # 30 * 10 = 300 seconds max startup time
timeoutSeconds: 5
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
failureThreshold: 3
Without a startup probe, you would need a large initialDelaySeconds on the liveness probe, which slows down failure detection after the app is running.
Probe Types Comparison
Kubernetes supports four mechanisms for performing health checks:
| Probe Type | How It Works | Best For | Example |
|---|---|---|---|
| HTTP GET | Sends HTTP request, 200-399 = success | Web servers, REST APIs | httpGet: {path: /healthz, port: 8080} |
| TCP Socket | Opens TCP connection, success if port is open | Databases, Redis, non-HTTP services | tcpSocket: {port: 5432} |
| Exec | Runs command inside container, exit code 0 = success | Custom checks, file-based health | exec: {command: ["/bin/check"]} |
| gRPC | gRPC health check protocol (K8s 1.24+) | gRPC services | grpc: {port: 50051} |
TCP Socket Probe
livenessProbe:
tcpSocket:
port: 5432
initialDelaySeconds: 15
periodSeconds: 10
Exec Probe
livenessProbe:
exec:
command:
- /bin/sh
- -c
- pg_isready -U postgres -h localhost
initialDelaySeconds: 30
periodSeconds: 10
gRPC Probe
livenessProbe:
grpc:
port: 50051
service: my.custom.HealthService # Optional, defaults to ""
initialDelaySeconds: 10
periodSeconds: 10
Configuration Parameters
Every probe shares these tuning parameters:
| Parameter | Default | Description |
|---|---|---|
initialDelaySeconds | 0 | Seconds to wait before first probe |
periodSeconds | 10 | How often to run the probe |
timeoutSeconds | 1 | Seconds before the probe times out |
failureThreshold | 3 | Consecutive failures before action is taken |
successThreshold | 1 | Consecutive successes to be considered healthy (only for readiness) |
The total time before Kubernetes takes action is: initialDelaySeconds + (periodSeconds * failureThreshold).
Real-World Patterns
Java / Spring Boot
Spring Boot applications often take 30-90 seconds to start. Use a startup probe and the built-in actuator endpoints:
startupProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
periodSeconds: 10
failureThreshold: 30
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
periodSeconds: 15
failureThreshold: 3
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
periodSeconds: 5
failureThreshold: 3
Node.js / Express
Node.js apps start fast, so startup probes are usually unnecessary:
livenessProbe:
httpGet:
path: /healthz
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 3
periodSeconds: 5
Python / Django / FastAPI
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 10
periodSeconds: 15
readinessProbe:
httpGet:
path: /health/ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
Common Mistakes to Avoid
Mistake 1: Liveness probe checks external dependencies. If your liveness probe calls the database and the database is down, Kubernetes restarts your pod. But the database is still down, so the new pod fails too. Now you have a restart loop that makes recovery harder.
# BAD: liveness checks database connectivity
livenessProbe:
httpGet:
path: /health/full # Checks DB, Redis, S3...
port: 8080
# GOOD: liveness checks only internal state
livenessProbe:
httpGet:
path: /healthz # Only checks "is the process responsive?"
port: 8080
Mistake 2: Aggressive timeouts. A timeoutSeconds: 1 might work in dev but fail under production load when the app takes 2 seconds to respond. This triggers unnecessary restarts.
Mistake 3: No startup probe for slow apps. Without it, you set initialDelaySeconds: 120 on the liveness probe, meaning after a crash, Kubernetes waits 2 minutes before checking again.
Mistake 4: Same endpoint for liveness and readiness. They serve different purposes. Readiness should check dependencies. Liveness should only check if the process itself is stuck.
Probe Decision Flowchart
Use this to decide which probes your app needs:
- Does your app take more than 10 seconds to start? Yes → add a startup probe.
- Can your app get into a broken state that only a restart fixes? Yes → add a liveness probe.
- Does your app need warm-up time or depend on external services? Yes → add a readiness probe.
- Most apps need all three. Start with readiness, add liveness, add startup if the app is slow to boot.
# Debugging probes: check events for probe failures
kubectl describe pod my-app | grep -A 5 "Events"
# Watch probe status in real time
kubectl get pods -w
# Test your health endpoint manually
kubectl exec my-app -- curl -s localhost:8080/healthz
kubectl exec my-app -- curl -s localhost:8080/ready
Next, we will cover Kubernetes resource management — CPU and memory requests, limits, QoS classes, and how to right-size your workloads to avoid OOMKilled surprises.
