DevOps Metrics That Matter — DORA, Lead Time, and Change Failure Rate
Google's DORA team spent seven years studying thousands of engineering organizations to answer one question: what separates elite performers from everyone else? The answer was not better tools or bigger budgets. It was four specific metrics that capture the speed and stability of software delivery. If you measure nothing else, measure these.
The Four DORA Metrics Explained
DORA (DevOps Research and Assessment) identified four key metrics that predict software delivery performance and organizational outcomes:
Speed Stability
┌─────────────────────┐ ┌─────────────────────┐
│ Deployment Frequency │ │ Change Failure Rate │
│ How often do you │ │ What % of changes │
│ deploy to production?│ │ cause failures? │
├─────────────────────┤ ├─────────────────────┤
│ Lead Time for Changes│ │ Time to Restore │
│ How long from commit │ │ How long to recover │
│ to production? │ │ from a failure? │
└─────────────────────┘ └─────────────────────┘
Key insight: Elite teams are BOTH faster AND more stable.
Speed and stability are NOT trade-offs — they reinforce each other.
Performance Benchmarks
The 2023 State of DevOps Report defines four performance clusters:
| Metric | Elite | High | Medium | Low |
|---|---|---|---|---|
| Deployment Frequency | On-demand (multiple deploys/day) | Between once/day and once/week | Between once/week and once/month | Between once/month and once every 6 months |
| Lead Time for Changes | Less than 1 hour | Between 1 day and 1 week | Between 1 week and 1 month | Between 1 month and 6 months |
| Change Failure Rate | 0-5% | 5-10% | 10-15% | 16-30% |
| Time to Restore Service | Less than 1 hour | Less than 1 day | Between 1 day and 1 week | More than 1 week |
Elite performers deploy 973x more frequently than low performers, with 6,570x faster lead time and 3x lower change failure rate.
Measuring Each Metric
1. Deployment Frequency
# Simple: Count production deploys from git tags
git tag --list 'release-*' --sort=-creatordate | head -30
# Count deploys in the last 30 days
DEPLOYS=$(git log --oneline --since="30 days ago" \
--grep="deploy\|release" | wc -l)
echo "Deployment Frequency: $DEPLOYS deploys in last 30 days"
# From CI/CD: Query GitHub Actions completed deployments
gh run list --workflow=deploy.yml --status=success \
--created=">$(date -d '30 days ago' +%Y-%m-%d)" \
--json conclusion,createdAt | jq length
2. Lead Time for Changes
# lead_time.py — Calculate lead time from commit to deploy
import subprocess
import json
from datetime import datetime
def get_lead_time(deploy_sha, deploy_time):
"""Calculate time between first commit in PR and deploy."""
# Get the merge commit's parent (the branch tip)
result = subprocess.run(
["git", "log", "--format=%H %aI", f"{deploy_sha}~1..{deploy_sha}"],
capture_output=True, text=True
)
commits = []
for line in result.stdout.strip().split('\n'):
sha, timestamp = line.split(' ', 1)
commit_time = datetime.fromisoformat(timestamp)
commits.append(commit_time)
if commits:
first_commit = min(commits)
lead_time = deploy_time - first_commit
return lead_time.total_seconds() / 3600 # hours
return None
# Example output:
# PR #142: Lead time = 4.2 hours (commit → production)
# PR #143: Lead time = 18.7 hours
# Median lead time (30 days): 6.5 hours
3. Change Failure Rate
# Track change failures in your incident management
# change_failure_tracking.yml
# Method 1: Tag failed deploys in CI
# After each deploy, track outcome:
deploys:
total_30_days: 45
failed_30_days: 3 # Rolled back or caused incident
change_failure_rate: "6.7%" # 3/45
# Method 2: Link incidents to deploy SHAs
# In your incident tracker:
incidents:
- id: INC-2025-089
caused_by_deploy: "release-2025-12-15-02"
severity: P2
description: "Payment API 500 errors after deploy"
- id: INC-2025-091
caused_by_deploy: "release-2025-12-18-01"
severity: P3
description: "Search latency regression"
4. Time to Restore Service (MTTR)
# Calculate MTTR from PagerDuty/Opsgenie data
# Using PagerDuty API:
curl -s "https://api.pagerduty.com/analytics/metrics/incidents/all" \
-H "Authorization: Token token=$PD_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"filters": {
"created_at_start": "2025-11-01T00:00:00Z",
"created_at_end": "2025-12-01T00:00:00Z"
},
"aggregate_unit": "month"
}' | jq '.data[0].mean_seconds_to_resolve / 3600'
# Output: 2.4 (hours — "High" performer bracket)
Tools for Measuring DORA Metrics
| Tool | Type | DORA Support | Pricing |
|---|---|---|---|
| LinearB | Engineering metrics | Full DORA + flow metrics | Free tier available |
| Sleuth | Deploy tracking | Full DORA, auto-detection | Free for small teams |
| Faros AI | Engineering intelligence | DORA + custom metrics | Enterprise |
| Backstage + plugins | Developer portal | DORA via plugins | Open source |
| Propelo (Harness SEI) | Engineering insights | Full DORA + SPACE | Enterprise |
| Custom (Prometheus) | DIY | Whatever you build | Free (your time) |
Building a Custom Metrics Dashboard
# prometheus-dora-rules.yml
# Custom recording rules for DORA metrics in Prometheus
groups:
- name: dora_metrics
interval: 1h
rules:
# Deployment Frequency (deploys per day, 30-day rolling)
- record: dora:deployment_frequency:rate30d
expr: |
sum(increase(deployments_total{env="production"}[30d])) / 30
# Lead Time (median, 30-day rolling) — requires histogram
- record: dora:lead_time_hours:p50
expr: |
histogram_quantile(0.50,
sum(rate(deploy_lead_time_seconds_bucket{env="production"}[30d]))
by (le)
) / 3600
# Change Failure Rate (30-day rolling)
- record: dora:change_failure_rate:ratio30d
expr: |
sum(increase(deployments_total{env="production",result="failure"}[30d]))
/
sum(increase(deployments_total{env="production"}[30d]))
# Time to Restore (median, 30-day rolling)
- record: dora:time_to_restore_hours:p50
expr: |
histogram_quantile(0.50,
sum(rate(incident_resolution_seconds_bucket[30d]))
by (le)
) / 3600
// Grafana dashboard panel — DORA Scorecard
{
"panels": [
{
"title": "DORA Scorecard",
"type": "table",
"targets": [
{
"expr": "dora:deployment_frequency:rate30d",
"legendFormat": "Deploy Frequency (per day)"
},
{
"expr": "dora:lead_time_hours:p50",
"legendFormat": "Lead Time (hours, p50)"
},
{
"expr": "dora:change_failure_rate:ratio30d * 100",
"legendFormat": "Change Failure Rate (%)"
},
{
"expr": "dora:time_to_restore_hours:p50",
"legendFormat": "Time to Restore (hours, p50)"
}
],
"fieldConfig": {
"overrides": [
{
"matcher": { "id": "byName", "options": "Change Failure Rate (%)" },
"properties": [
{ "id": "thresholds", "value": {
"steps": [
{ "color": "green", "value": 0 },
{ "color": "yellow", "value": 10 },
{ "color": "red", "value": 15 }
]
}}
]
}
]
}
}
]
}
Beyond DORA: Flow Metrics
DORA tells you how fast and stable your delivery is. Flow metrics tell you whether you are delivering the right things:
Flow Metrics (from Value Stream Management):
Flow Velocity — How many items completed per unit time?
Flow Efficiency — Active time / (Active time + Wait time)
Flow Time — Total time from "started" to "done"
Flow Load — Work in progress (WIP) at any point
Flow Distribution — % of work across features/defects/debt/risk
Example:
A team completes 20 items/sprint (velocity looks good)
But flow efficiency is 15% (items wait 85% of the time)
And flow distribution is 70% defects, 10% features
→ The team is "productive" but spending most effort on
rework, and items spend most of their time waiting.
The real bottleneck is quality and handoff queues.
Vanity Metrics to Avoid
| Vanity Metric | Why It Is Misleading | Better Alternative |
|---|---|---|
| Lines of code | More code is not better code | Deployment frequency |
| Number of commits | Encourages micro-commits | Lead time for changes |
| Story points completed | Inflated over time, not comparable | Flow velocity (items completed) |
| Test count | More tests does not mean better coverage | Mutation testing score |
| Uptime percentage (without SLOs) | 99.9% means nothing without context | Error budget burn rate |
| Number of deploys (without CFR) | Fast but breaking everything | Deploy frequency AND change failure rate together |
| Mean time between failures | Encourages avoiding change | MTTR (recover fast, not fail rarely) |
Using Metrics for Improvement, Not Punishment
This is the most important section in this post. Metrics weaponized against teams will destroy your DevOps culture faster than any technical debt:
WRONG: "Team A has a 15% change failure rate. They need to improve."
→ Team A stops deploying frequently to reduce failures.
→ Lead time increases. Batch sizes grow. Failures get WORSE.
RIGHT: "Team A's change failure rate increased from 8% to 15%.
Let's look at what changed and how we can help."
→ Team A discovers they skipped staging for a deadline.
→ They invest in better test environments.
→ CFR drops to 6%.
Rules for healthy metric usage:
1. Teams own their metrics — no cross-team comparison
2. Metrics drive conversations, not consequences
3. Always look at trends, never snapshots
4. Pair speed metrics with stability metrics
5. Celebrate improvement, not absolute numbers
Metric-Driven Retrospectives
# retrospective_template.yml
# Run this monthly with your DORA data
retrospective:
date: "2025-12-20"
team: "payments-squad"
metrics_review:
deployment_frequency:
current: "3.2/day"
previous: "2.8/day"
trend: "improving"
lead_time:
current: "4.5 hours"
previous: "6.2 hours"
trend: "improving"
action: "Parallel test stages reduced CI time by 25 min"
change_failure_rate:
current: "12%"
previous: "8%"
trend: "degrading"
action: "3 config-related failures — need config validation in CI"
time_to_restore:
current: "45 min"
previous: "52 min"
trend: "stable"
improvement_actions:
- owner: "Sarah"
action: "Add config schema validation to CI pipeline"
due: "2026-01-03"
expected_impact: "Reduce CFR by ~5%"
- owner: "Mike"
action: "Set up automated rollback on error rate spike"
due: "2026-01-10"
expected_impact: "Reduce MTTR to < 15 min for deploy failures"
Closing Note
Metrics are a compass, not a scorecard. The four DORA metrics work because they capture the fundamental tension in software delivery: going fast versus staying stable. Elite teams prove these are not trade-offs — they reinforce each other. Start by measuring just one metric accurately, make it visible to your team, and use it to drive one improvement per sprint. The numbers will follow the culture, not the other way around.
