Kubernetes Health Probes — Liveness, Readiness, and Startup Explained

July 5, 2025 · 7 min read

DevOps & Cloud Learning Hub

Your pod shows Running status, but the app inside crashed five minutes ago. Users get 502 errors while Kubernetes happily reports everything is fine. Without health probes, Kubernetes has no idea whether your application is actually working — it only knows the process is alive.

Why Health Probes Matter

By default, Kubernetes considers a container "healthy" as long as the main process (PID 1) is running. But a running process does not mean a functioning application. Your web server could be stuck in a deadlock, your database connection pool exhausted, or your app caught in an infinite loop. Health probes give Kubernetes eyes into your application's actual state.

# Without probes: pod shows Running even though the app is broken
kubectl get pods
# NAME        READY   STATUS    RESTARTS   AGE
# my-app      1/1     Running   0          45m
# ^ Looks healthy, but returning 500 errors to every request

Kubernetes provides three types of probes, each serving a different purpose in the pod lifecycle.

Liveness Probe — Restart Unhealthy Containers

The liveness probe answers one question: is this container still working? If the liveness probe fails, Kubernetes kills the container and restarts it according to the pod's restartPolicy.

Use liveness probes for detecting deadlocks, infinite loops, or corrupted state that only a restart can fix.

apiVersion: v1
kind: Pod
metadata:
  name: web-app
spec:
  containers:
  - name: app
    image: my-app:1.0
    ports:
    - containerPort: 8080
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 10
      failureThreshold: 3
      timeoutSeconds: 5

# When liveness fails, you see restarts incrementing
kubectl get pods
# NAME       READY   STATUS    RESTARTS   AGE
# web-app    1/1     Running   3          10m

# Check why it restarted
kubectl describe pod web-app | grep -A 10 "Last State"
# Last State: Terminated
#   Reason:   Liveness probe failed
#   Exit Code: 137

Readiness Probe — Control Traffic Routing

The readiness probe answers: is this container ready to accept traffic? If the readiness probe fails, Kubernetes removes the pod from all Service endpoints. The container is not restarted — it just stops receiving traffic until the probe passes again.

Use readiness probes for warm-up periods, dependency checks, and temporary overload situations.

apiVersion: v1
kind: Pod
metadata:
  name: api-server
spec:
  containers:
  - name: api
    image: my-api:2.0
    ports:
    - containerPort: 3000
    readinessProbe:
      httpGet:
        path: /ready
        port: 3000
      initialDelaySeconds: 5
      periodSeconds: 5
      failureThreshold: 3
      successThreshold: 1
      timeoutSeconds: 3

# When readiness fails, READY shows 0/1 — no traffic routed
kubectl get pods
# NAME         READY   STATUS    RESTARTS   AGE
# api-server   0/1     Running   0          2m

# Endpoints are removed from the Service
kubectl get endpoints api-service
# NAME          ENDPOINTS
# api-service   <none>          # No backends — traffic goes nowhere

Startup Probe — Slow-Starting Containers

The startup probe protects slow-starting containers. While the startup probe is running, liveness and readiness probes are disabled. Once the startup probe succeeds, Kubernetes hands control to the other probes.

This is essential for Java applications, legacy apps, or anything that needs more than 30 seconds to initialize.

apiVersion: v1
kind: Pod
metadata:
  name: legacy-app
spec:
  containers:
  - name: app
    image: legacy-java-app:1.0
    ports:
    - containerPort: 8080
    startupProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 0
      periodSeconds: 10
      failureThreshold: 30       # 30 * 10 = 300 seconds max startup time
      timeoutSeconds: 5
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      periodSeconds: 10
      failureThreshold: 3
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      periodSeconds: 5
      failureThreshold: 3

Without a startup probe, you would need a large initialDelaySeconds on the liveness probe, which slows down failure detection after the app is running.

Probe Types Comparison

Kubernetes supports four mechanisms for performing health checks:

Probe Type	How It Works	Best For	Example
HTTP GET	Sends HTTP request, 200-399 = success	Web servers, REST APIs	`httpGet: {path: /healthz, port: 8080}`
TCP Socket	Opens TCP connection, success if port is open	Databases, Redis, non-HTTP services	`tcpSocket: {port: 5432}`
Exec	Runs command inside container, exit code 0 = success	Custom checks, file-based health	`exec: {command: ["/bin/check"]}`
gRPC	gRPC health check protocol (K8s 1.24+)	gRPC services	`grpc: {port: 50051}`

TCP Socket Probe

livenessProbe:
  tcpSocket:
    port: 5432
  initialDelaySeconds: 15
  periodSeconds: 10

Exec Probe

livenessProbe:
  exec:
    command:
    - /bin/sh
    - -c
    - pg_isready -U postgres -h localhost
  initialDelaySeconds: 30
  periodSeconds: 10

gRPC Probe

livenessProbe:
  grpc:
    port: 50051
    service: my.custom.HealthService   # Optional, defaults to ""
  initialDelaySeconds: 10
  periodSeconds: 10

Configuration Parameters

Every probe shares these tuning parameters:

Parameter	Default	Description
`initialDelaySeconds`	0	Seconds to wait before first probe
`periodSeconds`	10	How often to run the probe
`timeoutSeconds`	1	Seconds before the probe times out
`failureThreshold`	3	Consecutive failures before action is taken
`successThreshold`	1	Consecutive successes to be considered healthy (only for readiness)

The total time before Kubernetes takes action is: initialDelaySeconds + (periodSeconds * failureThreshold).

Real-World Patterns

Java / Spring Boot

Spring Boot applications often take 30-90 seconds to start. Use a startup probe and the built-in actuator endpoints:

startupProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  periodSeconds: 10
  failureThreshold: 30
livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  periodSeconds: 15
  failureThreshold: 3
readinessProbe:
  httpGet:
    path: /actuator/health/readiness
    port: 8080
  periodSeconds: 5
  failureThreshold: 3

Node.js / Express

Node.js apps start fast, so startup probes are usually unnecessary:

livenessProbe:
  httpGet:
    path: /healthz
    port: 3000
  initialDelaySeconds: 5
  periodSeconds: 10
readinessProbe:
  httpGet:
    path: /ready
    port: 3000
  initialDelaySeconds: 3
  periodSeconds: 5

Python / Django / FastAPI

livenessProbe:
  httpGet:
    path: /health
    port: 8000
  initialDelaySeconds: 10
  periodSeconds: 15
readinessProbe:
  httpGet:
    path: /health/ready
    port: 8000
  initialDelaySeconds: 5
  periodSeconds: 5

Common Mistakes to Avoid

Mistake 1: Liveness probe checks external dependencies. If your liveness probe calls the database and the database is down, Kubernetes restarts your pod. But the database is still down, so the new pod fails too. Now you have a restart loop that makes recovery harder.

# BAD: liveness checks database connectivity
livenessProbe:
  httpGet:
    path: /health/full    # Checks DB, Redis, S3...
    port: 8080

# GOOD: liveness checks only internal state
livenessProbe:
  httpGet:
    path: /healthz        # Only checks "is the process responsive?"
    port: 8080

Mistake 2: Aggressive timeouts. A timeoutSeconds: 1 might work in dev but fail under production load when the app takes 2 seconds to respond. This triggers unnecessary restarts.

Mistake 3: No startup probe for slow apps. Without it, you set initialDelaySeconds: 120 on the liveness probe, meaning after a crash, Kubernetes waits 2 minutes before checking again.

Mistake 4: Same endpoint for liveness and readiness. They serve different purposes. Readiness should check dependencies. Liveness should only check if the process itself is stuck.

Probe Decision Flowchart

Use this to decide which probes your app needs:

Does your app take more than 10 seconds to start? Yes → add a startup probe.
Can your app get into a broken state that only a restart fixes? Yes → add a liveness probe.
Does your app need warm-up time or depend on external services? Yes → add a readiness probe.
Most apps need all three. Start with readiness, add liveness, add startup if the app is slow to boot.

# Debugging probes: check events for probe failures
kubectl describe pod my-app | grep -A 5 "Events"

# Watch probe status in real time
kubectl get pods -w

# Test your health endpoint manually
kubectl exec my-app -- curl -s localhost:8080/healthz
kubectl exec my-app -- curl -s localhost:8080/ready

Next, we will cover Kubernetes resource management — CPU and memory requests, limits, QoS classes, and how to right-size your workloads to avoid OOMKilled surprises.

Why Health Probes Matter​

Liveness Probe — Restart Unhealthy Containers​

Readiness Probe — Control Traffic Routing​

Startup Probe — Slow-Starting Containers​

Probe Types Comparison​

TCP Socket Probe​

Exec Probe​

gRPC Probe​

Configuration Parameters​

Real-World Patterns​

Java / Spring Boot​

Node.js / Express​

Python / Django / FastAPI​

Common Mistakes to Avoid​

Probe Decision Flowchart​

Stay Updated