Skip to main content

Monitor Kubernetes with Prometheus and Grafana

· 6 min read
Goel Academy
DevOps & Cloud Learning Hub

Your cluster is running thirty microservices, and one of them is silently eating all the memory on node-3. By the time someone notices, the node is in NotReady state and pods are getting evicted left and right. Without proper monitoring, you are flying blind in production — and Kubernetes gives you zero visibility out of the box.

Why Prometheus for Kubernetes

Prometheus was built for dynamic, container-based environments. It uses a pull-based model (scraping HTTP endpoints), supports automatic service discovery through the Kubernetes API, and stores time-series data efficiently. Combined with Grafana for visualization and AlertManager for alerts, it forms the de facto monitoring stack for Kubernetes.

The key components:

ComponentRoleDeployed As
PrometheusScrapes and stores metricsStatefulSet
GrafanaDashboards and visualizationDeployment
AlertManagerRoutes and manages alertsStatefulSet
node-exporterNode-level hardware/OS metricsDaemonSet
kube-state-metricsKubernetes object state metricsDeployment
Prometheus OperatorManages Prometheus lifecycle via CRDsDeployment

Installing kube-prometheus-stack with Helm

The kube-prometheus-stack Helm chart bundles everything you need. One command gives you Prometheus, Grafana, AlertManager, node-exporter, kube-state-metrics, and dozens of pre-configured recording rules and alerts.

# Add the Prometheus community Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install the full monitoring stack
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set grafana.adminPassword=strongpassword \
--set prometheus.prometheusSpec.retention=15d \
--set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi

Verify everything is running:

kubectl get pods -n monitoring
# NAME READY STATUS AGE
# monitoring-grafana-7d9b4d5f6-x2k8m 3/3 Running 2m
# monitoring-kube-prometheus-operator-6c8bd9f7d-4jhbn 1/1 Running 2m
# monitoring-kube-state-metrics-5f8b9c6d7-rn2vl 1/1 Running 2m
# monitoring-prometheus-node-exporter-abc12 1/1 Running 2m
# alertmanager-monitoring-kube-prometheus-alertmanager-0 2/2 Running 2m
# prometheus-monitoring-kube-prometheus-prometheus-0 2/2 Running 2m

ServiceMonitor — How Prometheus Discovers Targets

The Prometheus Operator introduces ServiceMonitor CRDs that tell Prometheus which services to scrape. Instead of editing a static config file, you create a Kubernetes resource.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: my-app-monitor
namespace: monitoring
labels:
release: monitoring # Must match Prometheus serviceMonitorSelector
spec:
namespaceSelector:
matchNames:
- production
selector:
matchLabels:
app: my-app
endpoints:
- port: metrics # Named port on the Service
interval: 15s
path: /metrics

Your application's Service needs a named port matching the port field above:

apiVersion: v1
kind: Service
metadata:
name: my-app
namespace: production
labels:
app: my-app
spec:
ports:
- name: metrics
port: 9090
targetPort: 9090
selector:
app: my-app

Prometheus will automatically start scraping your app every 15 seconds. No restart needed.

Essential Kubernetes Metrics

There are hundreds of metrics available. Here are the ones that actually matter for day-to-day operations:

Container Metrics (from cAdvisor)

# CPU usage per pod (cores)
sum(rate(container_cpu_usage_seconds_total{namespace="production"}[5m])) by (pod)

# Memory usage per pod (bytes)
sum(container_memory_working_set_bytes{namespace="production"}) by (pod)

# Container restarts — early warning for CrashLoopBackOff
sum(rate(kube_pod_container_status_restarts_total[1h])) by (pod, namespace) > 0

Cluster and Node Metrics (from node-exporter)

# Node CPU utilization percentage
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Node memory utilization percentage
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

# Disk usage percentage
(1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100

Kubernetes State Metrics (from kube-state-metrics)

# Pods not in Running state
kube_pod_status_phase{phase!="Running", phase!="Succeeded"} == 1

# Deployments with unavailable replicas
kube_deployment_status_replicas_unavailable > 0

# PVCs in Pending state (storage issues)
kube_persistentvolumeclaim_status_phase{phase="Pending"} == 1

Grafana Dashboards

The kube-prometheus-stack ships with excellent pre-built dashboards. Access Grafana and you will find these already configured:

# Port-forward to access Grafana
kubectl port-forward svc/monitoring-grafana -n monitoring 3000:80

# Open http://localhost:3000
# Login: admin / strongpassword

The most useful built-in dashboards:

DashboardIDShows
Kubernetes / Compute Resources / Cluster3119Cluster-wide CPU, memory, network
Kubernetes / Compute Resources / Namespace (Pods)12740Per-namespace resource usage
Kubernetes / Compute Resources / Pod6879Individual pod CPU, memory, network
Node Exporter Full1860Detailed node hardware metrics
Kubernetes / Networking / Cluster12124Cluster network traffic

You can also import community dashboards from grafana.com/grafana/dashboards using the dashboard ID.

AlertManager Rules for Kubernetes

Alerts are useless if they fire on every minor blip. Here is a set of practical alert rules that catch real problems without creating alert fatigue:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: kubernetes-alerts
namespace: monitoring
labels:
release: monitoring
spec:
groups:
- name: kubernetes-pod-alerts
rules:
- alert: PodCrashLooping
expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} is crash-looping"
description: "Pod {{ $labels.pod }} in {{ $labels.namespace }} has restarted more than once in 15 minutes."

- alert: PodNotReady
expr: kube_pod_status_ready{condition="true"} == 0
for: 10m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.pod }} not ready for 10m"

- alert: NodeHighMemory
expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 85
for: 5m
labels:
severity: critical
annotations:
summary: "Node {{ $labels.instance }} memory above 85%"

- alert: PersistentVolumeFillingUp
expr: kubelet_volume_stats_available_bytes / kubelet_volume_stats_capacity_bytes < 0.15
for: 5m
labels:
severity: warning
annotations:
summary: "PVC {{ $labels.persistentvolumeclaim }} is 85% full"

Recording Rules for Performance

When Prometheus evaluates complex queries on every dashboard load, it gets slow. Recording rules pre-compute expensive queries and store the results as new time series:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: recording-rules
namespace: monitoring
labels:
release: monitoring
spec:
groups:
- name: k8s-resource-recording
interval: 30s
rules:
- record: namespace:container_cpu_usage:sum_rate5m
expr: sum(rate(container_cpu_usage_seconds_total{image!=""}[5m])) by (namespace)

- record: namespace:container_memory_working_set:sum
expr: sum(container_memory_working_set_bytes{image!=""}) by (namespace)

- record: node:node_cpu_utilization:avg5m
expr: avg by (node) (1 - rate(node_cpu_seconds_total{mode="idle"}[5m]))

- record: cluster:pod_count_by_phase:sum
expr: sum(kube_pod_status_phase) by (phase)

These pre-computed metrics load instantly in dashboards. Use the naming convention level:metric:operations — for example, namespace:container_cpu_usage:sum_rate5m tells you it is aggregated at the namespace level, measures CPU usage, and uses a sum of the 5-minute rate.

Practical Tips

Retention and storage. Set retention to 15-30 days for operational monitoring. For long-term storage, ship metrics to Thanos or Cortex.

Resource limits. Prometheus itself can consume significant memory. A cluster with 500 active time series per pod and 200 pods generates roughly 100,000 time series. Budget 2-4 GB of memory per 100K series.

Label cardinality. Avoid high-cardinality labels like user IDs or request IDs in metrics. Each unique label combination creates a new time series, and this is the number one cause of Prometheus OOM kills.

Scrape intervals. 15-30 seconds is standard. Faster intervals mean more storage and CPU. Slower intervals risk missing short-lived spikes.

Wrapping Up

With the kube-prometheus-stack, you get production-ready monitoring in a single Helm install — Prometheus for collection, Grafana for dashboards, AlertManager for notifications, and dozens of pre-built rules that catch real incidents. The ServiceMonitor CRD makes adding new scrape targets as easy as applying a YAML file.

But metrics only tell you what is happening. To understand why, you need logs. In the next post, we will build a centralized logging pipeline with the EFK stack and Loki to capture everything your applications write to stdout.