Monitor Docker Containers — cAdvisor, Prometheus, and Grafana

October 18, 2025 · 7 min read

DevOps & Cloud Learning Hub

docker stats shows you what is happening right now. It does not show you what happened at 3 AM when response times spiked. It does not alert you when a container's memory is trending toward its limit. It does not graph CPU usage over the past week to help you right-size your resource limits. For real monitoring, you need metrics collection, storage, visualization, and alerting. The standard stack for Docker is cAdvisor + Prometheus + Grafana, and you can have it running in under fifteen minutes.

docker stats — The Starting Point

docker stats is built into Docker and requires no setup. It is good for quick glances but has fundamental limitations.

# Real-time stats for all containers
docker stats

# CONTAINER ID  NAME    CPU %   MEM USAGE / LIMIT   MEM %   NET I/O        BLOCK I/O     PIDS
# abc123        api     2.45%   245MiB / 512MiB     47.8%   1.2MB / 500kB  10MB / 0B     12
# def456        worker  78.2%   480MiB / 512MiB     93.7%   500kB / 200kB  5MB / 1MB     45
# ghi789        db      12.5%   1.2GiB / 2GiB       60.0%   3MB / 15MB     50MB / 200MB  28

# Specific containers with custom format
docker stats --no-stream --format \
  "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}\t{{.NetIO}}\t{{.PIDs}}"

Limitations of docker stats:

No historical data — only shows the current moment.
No alerting — you have to be watching when something goes wrong.
No trend analysis — cannot see memory creeping up over hours.
No dashboards — text output only.
No correlation — cannot overlay CPU with response times.

cAdvisor — Container Metrics Collection

cAdvisor (Container Advisor) by Google collects resource usage and performance data from running containers. It exposes metrics in Prometheus format, making it the perfect data source.

# Run cAdvisor as a container
docker run -d \
  --name cadvisor \
  --volume /:/rootfs:ro \
  --volume /var/run:/var/run:ro \
  --volume /sys:/sys:ro \
  --volume /var/lib/docker/:/var/lib/docker:ro \
  --volume /dev/disk/:/dev/disk:ro \
  --publish 8080:8080 \
  --privileged \
  --device /dev/kmsg \
  gcr.io/cadvisor/cadvisor:latest

# cAdvisor web UI: http://localhost:8080
# Prometheus metrics endpoint: http://localhost:8080/metrics

# Verify cAdvisor is collecting metrics
curl -s http://localhost:8080/metrics | head -20

# container_cpu_usage_seconds_total{name="api",...} 45.234
# container_memory_usage_bytes{name="api",...} 256901120
# container_network_receive_bytes_total{name="api",...} 1258291
# container_fs_usage_bytes{name="api",...} 10485760

Prometheus — Metrics Storage and Querying

Prometheus scrapes metrics from cAdvisor at regular intervals, stores them as time-series data, and provides a query language (PromQL) for analysis.

# prometheus.yml — Prometheus configuration
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alerts.yml"

scrape_configs:
  # Scrape cAdvisor for container metrics
  - job_name: "cadvisor"
    static_configs:
      - targets: ["cadvisor:8080"]

  # Scrape Prometheus itself
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  # Scrape Docker daemon metrics (if enabled)
  - job_name: "docker"
    static_configs:
      - targets: ["host.docker.internal:9323"]

# Enable Docker daemon metrics (optional)
# Add to /etc/docker/daemon.json
{
  "metrics-addr": "0.0.0.0:9323",
  "experimental": true
}
# Restart Docker: sudo systemctl restart docker

The Complete Monitoring Stack

Here is a production-ready docker-compose file that runs the entire monitoring stack.

# docker-compose.monitoring.yml
services:
  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    privileged: true
    devices:
      - /dev/kmsg:/dev/kmsg
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    ports:
      - "8080:8080"
    restart: unless-stopped
    networks:
      - monitoring

  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
      - ./prometheus/alerts.yml:/etc/prometheus/alerts.yml:ro
      - prometheus-data:/prometheus
    ports:
      - "9090:9090"
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.path=/prometheus"
      - "--storage.tsdb.retention.time=30d"
      - "--web.enable-lifecycle"
    restart: unless-stopped
    networks:
      - monitoring

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=changeme
      - GF_USERS_ALLOW_SIGN_UP=false
    volumes:
      - grafana-data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning:ro
    ports:
      - "3000:3000"
    restart: unless-stopped
    depends_on:
      - prometheus
    networks:
      - monitoring

volumes:
  prometheus-data:
  grafana-data:

networks:
  monitoring:
    driver: bridge

# Start the monitoring stack
docker compose -f docker-compose.monitoring.yml up -d

# Access the tools:
# cAdvisor:    http://localhost:8080
# Prometheus:  http://localhost:9090
# Grafana:     http://localhost:3000 (admin/changeme)

Key Metrics to Track

Metric	PromQL Query	What It Tells You	Alert Threshold
CPU usage	`rate(container_cpu_usage_seconds_total[5m])`	CPU cores consumed per container	> 80% of limit
Memory usage	`container_memory_usage_bytes`	Current memory consumption	> 85% of limit
Memory limit %	`container_memory_usage_bytes / container_spec_memory_limit_bytes * 100`	How close to OOM kill	> 90%
Network RX	`rate(container_network_receive_bytes_total[5m])`	Incoming network bandwidth	Unusual spike
Network TX	`rate(container_network_transmit_bytes_total[5m])`	Outgoing network bandwidth	Unusual spike
Disk read	`rate(container_fs_reads_bytes_total[5m])`	Disk read throughput	Sustained high I/O
Disk write	`rate(container_fs_writes_bytes_total[5m])`	Disk write throughput	Sustained high I/O
Restart count	`kube_pod_container_status_restarts_total` or container events	How often a container restarts	> 3 in 10 minutes
Container uptime	`time() - container_start_time_seconds`	How long since last restart	Unexpected restart

Grafana Dashboards for Docker

Grafana turns Prometheus queries into visual dashboards. The easiest way to get started is to import community dashboards.

# Auto-provision Grafana datasource
# ./grafana/provisioning/datasources/prometheus.yml

# grafana/provisioning/datasources/prometheus.yml
apiVersion: 1
datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: false

# Popular Grafana dashboard IDs for Docker:
# 893  — Docker and system monitoring
# 14282 — cAdvisor container metrics
# 1229  — Docker Prometheus monitoring

# Import via Grafana UI:
# 1. Go to Dashboards → Import
# 2. Enter dashboard ID (e.g., 14282)
# 3. Select Prometheus as the data source
# 4. Click Import

Useful PromQL Queries

# Top 5 containers by CPU usage
topk(5, rate(container_cpu_usage_seconds_total{name!=""}[5m]))

# Containers approaching memory limit (> 80%)
container_memory_usage_bytes{name!=""} /
container_spec_memory_limit_bytes{name!=""} * 100 > 80

# Network throughput per container (MB/s)
rate(container_network_receive_bytes_total{name!=""}[5m]) / 1024 / 1024

# Container restart rate (restarts per hour)
increase(container_start_time_seconds{name!=""}[1h])

# Disk I/O per container
rate(container_fs_writes_bytes_total{name!=""}[5m])
+ rate(container_fs_reads_bytes_total{name!=""}[5m])

Alerting on Container Health

Set up Prometheus alerting rules to get notified before problems become outages.

# prometheus/alerts.yml
groups:
  - name: container_alerts
    rules:
      - alert: ContainerHighMemory
        expr: |
          container_memory_usage_bytes{name!=""} /
          container_spec_memory_limit_bytes{name!=""} * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Container {{ $labels.name }} memory usage > 85%"
          description: "{{ $labels.name }} is using {{ $value | printf \"%.1f\" }}% of its memory limit."

      - alert: ContainerHighCPU
        expr: rate(container_cpu_usage_seconds_total{name!=""}[5m]) > 0.8
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Container {{ $labels.name }} high CPU usage"

      - alert: ContainerRestarting
        expr: increase(container_start_time_seconds{name!=""}[10m]) > 3
        labels:
          severity: critical
        annotations:
          summary: "Container {{ $labels.name }} restarting frequently"

      - alert: ContainerDown
        expr: absent(container_last_seen{name=~".+"})
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Container {{ $labels.name }} is down"

# Verify alerts are loaded in Prometheus
curl http://localhost:9090/api/v1/rules
# Check Prometheus UI → Alerts tab: http://localhost:9090/alerts

Container Resource Usage Trends

The real value of Prometheus comes from tracking trends over time. A container that uses 300 MB today and 350 MB tomorrow has a memory leak that docker stats will never catch.

# Memory usage trend over 24 hours (predict OOM in the next 4 hours)
predict_linear(container_memory_usage_bytes{name="api"}[24h], 4 * 3600)
> container_spec_memory_limit_bytes{name="api"}

# CPU usage comparison: this week vs last week
rate(container_cpu_usage_seconds_total{name="api"}[5m])
- rate(container_cpu_usage_seconds_total{name="api"}[5m] offset 7d)

# Average memory usage over the past week (for right-sizing)
avg_over_time(container_memory_usage_bytes{name="api"}[7d])

Monitoring: Production vs Development

Aspect	Development	Production
Scrape interval	30s (less overhead)	15s (more granularity)
Retention	7 days	30-90 days
Alerting	Disabled or Slack only	PagerDuty / OpsGenie
Dashboards	Basic overview	Per-service dashboards
cAdvisor	Optional (use docker stats)	Required
Grafana auth	Default admin/admin	SSO / LDAP
Prometheus storage	Local volume	Remote write to Thanos/Mimir
Network	Bridge	Overlay with monitoring network

# Production additions to the monitoring stack
services:
  alertmanager:
    image: prom/alertmanager:latest
    volumes:
      - ./alertmanager/config.yml:/etc/alertmanager/config.yml:ro
    ports:
      - "9093:9093"
    networks:
      - monitoring

  node-exporter:
    image: prom/node-exporter:latest
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - "--path.procfs=/host/proc"
      - "--path.sysfs=/host/sys"
      - "--path.rootfs=/rootfs"
    ports:
      - "9100:9100"
    networks:
      - monitoring

Add Node Exporter for host-level metrics (disk space, system CPU, available memory) alongside cAdvisor for container-level metrics. Together they give you complete visibility.

Wrapping Up

docker stats is a flashlight. Prometheus + Grafana is a security camera system with motion detection. The flashlight is useful for quick checks, but it cannot tell you what happened while you were not looking. With cAdvisor collecting container metrics, Prometheus storing and querying them, and Grafana visualizing trends, you get historical analysis, predictive alerting, and the data you need to right-size your resource limits. The fifteen minutes it takes to set up the monitoring stack will save you hours of debugging when something goes wrong at 3 AM.

In the next post, we will cover Docker Compose in Production — profiles, depends_on health conditions, restart policies, resource limits, and when it is actually appropriate to use Compose for production workloads.

docker stats — The Starting Point​

cAdvisor — Container Metrics Collection​

Prometheus — Metrics Storage and Querying​

The Complete Monitoring Stack​

Key Metrics to Track​

Grafana Dashboards for Docker​

Useful PromQL Queries​

Alerting on Container Health​

Container Resource Usage Trends​

Monitoring: Production vs Development​

Wrapping Up​

Stay Updated