Skip to main content

Monitor Docker Containers — cAdvisor, Prometheus, and Grafana

· 7 min read
Goel Academy
DevOps & Cloud Learning Hub

docker stats shows you what is happening right now. It does not show you what happened at 3 AM when response times spiked. It does not alert you when a container's memory is trending toward its limit. It does not graph CPU usage over the past week to help you right-size your resource limits. For real monitoring, you need metrics collection, storage, visualization, and alerting. The standard stack for Docker is cAdvisor + Prometheus + Grafana, and you can have it running in under fifteen minutes.

docker stats — The Starting Point

docker stats is built into Docker and requires no setup. It is good for quick glances but has fundamental limitations.

# Real-time stats for all containers
docker stats

# CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
# abc123 api 2.45% 245MiB / 512MiB 47.8% 1.2MB / 500kB 10MB / 0B 12
# def456 worker 78.2% 480MiB / 512MiB 93.7% 500kB / 200kB 5MB / 1MB 45
# ghi789 db 12.5% 1.2GiB / 2GiB 60.0% 3MB / 15MB 50MB / 200MB 28

# Specific containers with custom format
docker stats --no-stream --format \
"table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}\t{{.MemPerc}}\t{{.NetIO}}\t{{.PIDs}}"

Limitations of docker stats:

  • No historical data — only shows the current moment.
  • No alerting — you have to be watching when something goes wrong.
  • No trend analysis — cannot see memory creeping up over hours.
  • No dashboards — text output only.
  • No correlation — cannot overlay CPU with response times.

cAdvisor — Container Metrics Collection

cAdvisor (Container Advisor) by Google collects resource usage and performance data from running containers. It exposes metrics in Prometheus format, making it the perfect data source.

# Run cAdvisor as a container
docker run -d \
--name cadvisor \
--volume /:/rootfs:ro \
--volume /var/run:/var/run:ro \
--volume /sys:/sys:ro \
--volume /var/lib/docker/:/var/lib/docker:ro \
--volume /dev/disk/:/dev/disk:ro \
--publish 8080:8080 \
--privileged \
--device /dev/kmsg \
gcr.io/cadvisor/cadvisor:latest

# cAdvisor web UI: http://localhost:8080
# Prometheus metrics endpoint: http://localhost:8080/metrics
# Verify cAdvisor is collecting metrics
curl -s http://localhost:8080/metrics | head -20

# container_cpu_usage_seconds_total{name="api",...} 45.234
# container_memory_usage_bytes{name="api",...} 256901120
# container_network_receive_bytes_total{name="api",...} 1258291
# container_fs_usage_bytes{name="api",...} 10485760

Prometheus — Metrics Storage and Querying

Prometheus scrapes metrics from cAdvisor at regular intervals, stores them as time-series data, and provides a query language (PromQL) for analysis.

# prometheus.yml — Prometheus configuration
global:
scrape_interval: 15s
evaluation_interval: 15s

rule_files:
- "alerts.yml"

scrape_configs:
# Scrape cAdvisor for container metrics
- job_name: "cadvisor"
static_configs:
- targets: ["cadvisor:8080"]

# Scrape Prometheus itself
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]

# Scrape Docker daemon metrics (if enabled)
- job_name: "docker"
static_configs:
- targets: ["host.docker.internal:9323"]
# Enable Docker daemon metrics (optional)
# Add to /etc/docker/daemon.json
{
"metrics-addr": "0.0.0.0:9323",
"experimental": true
}
# Restart Docker: sudo systemctl restart docker

The Complete Monitoring Stack

Here is a production-ready docker-compose file that runs the entire monitoring stack.

# docker-compose.monitoring.yml
services:
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
container_name: cadvisor
privileged: true
devices:
- /dev/kmsg:/dev/kmsg
volumes:
- /:/rootfs:ro
- /var/run:/var/run:ro
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk/:/dev/disk:ro
ports:
- "8080:8080"
restart: unless-stopped
networks:
- monitoring

prometheus:
image: prom/prometheus:latest
container_name: prometheus
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./prometheus/alerts.yml:/etc/prometheus/alerts.yml:ro
- prometheus-data:/prometheus
ports:
- "9090:9090"
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--storage.tsdb.retention.time=30d"
- "--web.enable-lifecycle"
restart: unless-stopped
networks:
- monitoring

grafana:
image: grafana/grafana:latest
container_name: grafana
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=changeme
- GF_USERS_ALLOW_SIGN_UP=false
volumes:
- grafana-data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning:ro
ports:
- "3000:3000"
restart: unless-stopped
depends_on:
- prometheus
networks:
- monitoring

volumes:
prometheus-data:
grafana-data:

networks:
monitoring:
driver: bridge
# Start the monitoring stack
docker compose -f docker-compose.monitoring.yml up -d

# Access the tools:
# cAdvisor: http://localhost:8080
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3000 (admin/changeme)

Key Metrics to Track

MetricPromQL QueryWhat It Tells YouAlert Threshold
CPU usagerate(container_cpu_usage_seconds_total[5m])CPU cores consumed per container> 80% of limit
Memory usagecontainer_memory_usage_bytesCurrent memory consumption> 85% of limit
Memory limit %container_memory_usage_bytes / container_spec_memory_limit_bytes * 100How close to OOM kill> 90%
Network RXrate(container_network_receive_bytes_total[5m])Incoming network bandwidthUnusual spike
Network TXrate(container_network_transmit_bytes_total[5m])Outgoing network bandwidthUnusual spike
Disk readrate(container_fs_reads_bytes_total[5m])Disk read throughputSustained high I/O
Disk writerate(container_fs_writes_bytes_total[5m])Disk write throughputSustained high I/O
Restart countkube_pod_container_status_restarts_total or container eventsHow often a container restarts> 3 in 10 minutes
Container uptimetime() - container_start_time_secondsHow long since last restartUnexpected restart

Grafana Dashboards for Docker

Grafana turns Prometheus queries into visual dashboards. The easiest way to get started is to import community dashboards.

# Auto-provision Grafana datasource
# ./grafana/provisioning/datasources/prometheus.yml
# grafana/provisioning/datasources/prometheus.yml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: false
# Popular Grafana dashboard IDs for Docker:
# 893 — Docker and system monitoring
# 14282 — cAdvisor container metrics
# 1229 — Docker Prometheus monitoring

# Import via Grafana UI:
# 1. Go to Dashboards → Import
# 2. Enter dashboard ID (e.g., 14282)
# 3. Select Prometheus as the data source
# 4. Click Import

Useful PromQL Queries

# Top 5 containers by CPU usage
topk(5, rate(container_cpu_usage_seconds_total{name!=""}[5m]))

# Containers approaching memory limit (> 80%)
container_memory_usage_bytes{name!=""} /
container_spec_memory_limit_bytes{name!=""} * 100 > 80

# Network throughput per container (MB/s)
rate(container_network_receive_bytes_total{name!=""}[5m]) / 1024 / 1024

# Container restart rate (restarts per hour)
increase(container_start_time_seconds{name!=""}[1h])

# Disk I/O per container
rate(container_fs_writes_bytes_total{name!=""}[5m])
+ rate(container_fs_reads_bytes_total{name!=""}[5m])

Alerting on Container Health

Set up Prometheus alerting rules to get notified before problems become outages.

# prometheus/alerts.yml
groups:
- name: container_alerts
rules:
- alert: ContainerHighMemory
expr: |
container_memory_usage_bytes{name!=""} /
container_spec_memory_limit_bytes{name!=""} * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.name }} memory usage > 85%"
description: "{{ $labels.name }} is using {{ $value | printf \"%.1f\" }}% of its memory limit."

- alert: ContainerHighCPU
expr: rate(container_cpu_usage_seconds_total{name!=""}[5m]) > 0.8
for: 10m
labels:
severity: warning
annotations:
summary: "Container {{ $labels.name }} high CPU usage"

- alert: ContainerRestarting
expr: increase(container_start_time_seconds{name!=""}[10m]) > 3
labels:
severity: critical
annotations:
summary: "Container {{ $labels.name }} restarting frequently"

- alert: ContainerDown
expr: absent(container_last_seen{name=~".+"})
for: 1m
labels:
severity: critical
annotations:
summary: "Container {{ $labels.name }} is down"
# Verify alerts are loaded in Prometheus
curl http://localhost:9090/api/v1/rules
# Check Prometheus UI → Alerts tab: http://localhost:9090/alerts

The real value of Prometheus comes from tracking trends over time. A container that uses 300 MB today and 350 MB tomorrow has a memory leak that docker stats will never catch.

# Memory usage trend over 24 hours (predict OOM in the next 4 hours)
predict_linear(container_memory_usage_bytes{name="api"}[24h], 4 * 3600)
> container_spec_memory_limit_bytes{name="api"}

# CPU usage comparison: this week vs last week
rate(container_cpu_usage_seconds_total{name="api"}[5m])
- rate(container_cpu_usage_seconds_total{name="api"}[5m] offset 7d)

# Average memory usage over the past week (for right-sizing)
avg_over_time(container_memory_usage_bytes{name="api"}[7d])

Monitoring: Production vs Development

AspectDevelopmentProduction
Scrape interval30s (less overhead)15s (more granularity)
Retention7 days30-90 days
AlertingDisabled or Slack onlyPagerDuty / OpsGenie
DashboardsBasic overviewPer-service dashboards
cAdvisorOptional (use docker stats)Required
Grafana authDefault admin/adminSSO / LDAP
Prometheus storageLocal volumeRemote write to Thanos/Mimir
NetworkBridgeOverlay with monitoring network
# Production additions to the monitoring stack
services:
alertmanager:
image: prom/alertmanager:latest
volumes:
- ./alertmanager/config.yml:/etc/alertmanager/config.yml:ro
ports:
- "9093:9093"
networks:
- monitoring

node-exporter:
image: prom/node-exporter:latest
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- "--path.procfs=/host/proc"
- "--path.sysfs=/host/sys"
- "--path.rootfs=/rootfs"
ports:
- "9100:9100"
networks:
- monitoring

Add Node Exporter for host-level metrics (disk space, system CPU, available memory) alongside cAdvisor for container-level metrics. Together they give you complete visibility.

Wrapping Up

docker stats is a flashlight. Prometheus + Grafana is a security camera system with motion detection. The flashlight is useful for quick checks, but it cannot tell you what happened while you were not looking. With cAdvisor collecting container metrics, Prometheus storing and querying them, and Grafana visualizing trends, you get historical analysis, predictive alerting, and the data you need to right-size your resource limits. The fifteen minutes it takes to set up the monitoring stack will save you hours of debugging when something goes wrong at 3 AM.

In the next post, we will cover Docker Compose in Production — profiles, depends_on health conditions, restart policies, resource limits, and when it is actually appropriate to use Compose for production workloads.