Skip to main content

Kubernetes Health Probes — Liveness, Readiness, and Startup Explained

· 7 min read
Goel Academy
DevOps & Cloud Learning Hub

Your pod shows Running status, but the app inside crashed five minutes ago. Users get 502 errors while Kubernetes happily reports everything is fine. Without health probes, Kubernetes has no idea whether your application is actually working — it only knows the process is alive.

Why Health Probes Matter

By default, Kubernetes considers a container "healthy" as long as the main process (PID 1) is running. But a running process does not mean a functioning application. Your web server could be stuck in a deadlock, your database connection pool exhausted, or your app caught in an infinite loop. Health probes give Kubernetes eyes into your application's actual state.

# Without probes: pod shows Running even though the app is broken
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# my-app 1/1 Running 0 45m
# ^ Looks healthy, but returning 500 errors to every request

Kubernetes provides three types of probes, each serving a different purpose in the pod lifecycle.

Liveness Probe — Restart Unhealthy Containers

The liveness probe answers one question: is this container still working? If the liveness probe fails, Kubernetes kills the container and restarts it according to the pod's restartPolicy.

Use liveness probes for detecting deadlocks, infinite loops, or corrupted state that only a restart can fix.

apiVersion: v1
kind: Pod
metadata:
name: web-app
spec:
containers:
- name: app
image: my-app:1.0
ports:
- containerPort: 8080
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
failureThreshold: 3
timeoutSeconds: 5
# When liveness fails, you see restarts incrementing
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# web-app 1/1 Running 3 10m

# Check why it restarted
kubectl describe pod web-app | grep -A 10 "Last State"
# Last State: Terminated
# Reason: Liveness probe failed
# Exit Code: 137

Readiness Probe — Control Traffic Routing

The readiness probe answers: is this container ready to accept traffic? If the readiness probe fails, Kubernetes removes the pod from all Service endpoints. The container is not restarted — it just stops receiving traffic until the probe passes again.

Use readiness probes for warm-up periods, dependency checks, and temporary overload situations.

apiVersion: v1
kind: Pod
metadata:
name: api-server
spec:
containers:
- name: api
image: my-api:2.0
ports:
- containerPort: 3000
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3
successThreshold: 1
timeoutSeconds: 3
# When readiness fails, READY shows 0/1 — no traffic routed
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# api-server 0/1 Running 0 2m

# Endpoints are removed from the Service
kubectl get endpoints api-service
# NAME ENDPOINTS
# api-service <none> # No backends — traffic goes nowhere

Startup Probe — Slow-Starting Containers

The startup probe protects slow-starting containers. While the startup probe is running, liveness and readiness probes are disabled. Once the startup probe succeeds, Kubernetes hands control to the other probes.

This is essential for Java applications, legacy apps, or anything that needs more than 30 seconds to initialize.

apiVersion: v1
kind: Pod
metadata:
name: legacy-app
spec:
containers:
- name: app
image: legacy-java-app:1.0
ports:
- containerPort: 8080
startupProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 0
periodSeconds: 10
failureThreshold: 30 # 30 * 10 = 300 seconds max startup time
timeoutSeconds: 5
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
failureThreshold: 3

Without a startup probe, you would need a large initialDelaySeconds on the liveness probe, which slows down failure detection after the app is running.

Probe Types Comparison

Kubernetes supports four mechanisms for performing health checks:

Probe TypeHow It WorksBest ForExample
HTTP GETSends HTTP request, 200-399 = successWeb servers, REST APIshttpGet: {path: /healthz, port: 8080}
TCP SocketOpens TCP connection, success if port is openDatabases, Redis, non-HTTP servicestcpSocket: {port: 5432}
ExecRuns command inside container, exit code 0 = successCustom checks, file-based healthexec: {command: ["/bin/check"]}
gRPCgRPC health check protocol (K8s 1.24+)gRPC servicesgrpc: {port: 50051}

TCP Socket Probe

livenessProbe:
tcpSocket:
port: 5432
initialDelaySeconds: 15
periodSeconds: 10

Exec Probe

livenessProbe:
exec:
command:
- /bin/sh
- -c
- pg_isready -U postgres -h localhost
initialDelaySeconds: 30
periodSeconds: 10

gRPC Probe

livenessProbe:
grpc:
port: 50051
service: my.custom.HealthService # Optional, defaults to ""
initialDelaySeconds: 10
periodSeconds: 10

Configuration Parameters

Every probe shares these tuning parameters:

ParameterDefaultDescription
initialDelaySeconds0Seconds to wait before first probe
periodSeconds10How often to run the probe
timeoutSeconds1Seconds before the probe times out
failureThreshold3Consecutive failures before action is taken
successThreshold1Consecutive successes to be considered healthy (only for readiness)

The total time before Kubernetes takes action is: initialDelaySeconds + (periodSeconds * failureThreshold).

Real-World Patterns

Java / Spring Boot

Spring Boot applications often take 30-90 seconds to start. Use a startup probe and the built-in actuator endpoints:

startupProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
periodSeconds: 10
failureThreshold: 30
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
periodSeconds: 15
failureThreshold: 3
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
periodSeconds: 5
failureThreshold: 3

Node.js / Express

Node.js apps start fast, so startup probes are usually unnecessary:

livenessProbe:
httpGet:
path: /healthz
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 3
periodSeconds: 5

Python / Django / FastAPI

livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 10
periodSeconds: 15
readinessProbe:
httpGet:
path: /health/ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 5

Common Mistakes to Avoid

Mistake 1: Liveness probe checks external dependencies. If your liveness probe calls the database and the database is down, Kubernetes restarts your pod. But the database is still down, so the new pod fails too. Now you have a restart loop that makes recovery harder.

# BAD: liveness checks database connectivity
livenessProbe:
httpGet:
path: /health/full # Checks DB, Redis, S3...
port: 8080

# GOOD: liveness checks only internal state
livenessProbe:
httpGet:
path: /healthz # Only checks "is the process responsive?"
port: 8080

Mistake 2: Aggressive timeouts. A timeoutSeconds: 1 might work in dev but fail under production load when the app takes 2 seconds to respond. This triggers unnecessary restarts.

Mistake 3: No startup probe for slow apps. Without it, you set initialDelaySeconds: 120 on the liveness probe, meaning after a crash, Kubernetes waits 2 minutes before checking again.

Mistake 4: Same endpoint for liveness and readiness. They serve different purposes. Readiness should check dependencies. Liveness should only check if the process itself is stuck.

Probe Decision Flowchart

Use this to decide which probes your app needs:

  1. Does your app take more than 10 seconds to start? Yes → add a startup probe.
  2. Can your app get into a broken state that only a restart fixes? Yes → add a liveness probe.
  3. Does your app need warm-up time or depend on external services? Yes → add a readiness probe.
  4. Most apps need all three. Start with readiness, add liveness, add startup if the app is slow to boot.
# Debugging probes: check events for probe failures
kubectl describe pod my-app | grep -A 5 "Events"

# Watch probe status in real time
kubectl get pods -w

# Test your health endpoint manually
kubectl exec my-app -- curl -s localhost:8080/healthz
kubectl exec my-app -- curl -s localhost:8080/ready

Next, we will cover Kubernetes resource management — CPU and memory requests, limits, QoS classes, and how to right-size your workloads to avoid OOMKilled surprises.