DevOps Metrics That Matter — DORA, Lead Time, and Change Failure Rate

December 20, 2025 · 7 min read

DevOps & Cloud Learning Hub

Google's DORA team spent seven years studying thousands of engineering organizations to answer one question: what separates elite performers from everyone else? The answer was not better tools or bigger budgets. It was four specific metrics that capture the speed and stability of software delivery. If you measure nothing else, measure these.

The Four DORA Metrics Explained

DORA (DevOps Research and Assessment) identified four key metrics that predict software delivery performance and organizational outcomes:

         Speed                              Stability
  ┌─────────────────────┐           ┌─────────────────────┐
  │ Deployment Frequency │           │ Change Failure Rate  │
  │ How often do you     │           │ What % of changes    │
  │ deploy to production?│           │ cause failures?      │
  ├─────────────────────┤           ├─────────────────────┤
  │ Lead Time for Changes│           │ Time to Restore      │
  │ How long from commit │           │ How long to recover  │
  │ to production?       │           │ from a failure?      │
  └─────────────────────┘           └─────────────────────┘

  Key insight: Elite teams are BOTH faster AND more stable.
  Speed and stability are NOT trade-offs — they reinforce each other.

Performance Benchmarks

The 2023 State of DevOps Report defines four performance clusters:

Metric	Elite	High	Medium	Low
Deployment Frequency	On-demand (multiple deploys/day)	Between once/day and once/week	Between once/week and once/month	Between once/month and once every 6 months
Lead Time for Changes	Less than 1 hour	Between 1 day and 1 week	Between 1 week and 1 month	Between 1 month and 6 months
Change Failure Rate	0-5%	5-10%	10-15%	16-30%
Time to Restore Service	Less than 1 hour	Less than 1 day	Between 1 day and 1 week	More than 1 week

Elite performers deploy 973x more frequently than low performers, with 6,570x faster lead time and 3x lower change failure rate.

Measuring Each Metric

1. Deployment Frequency

# Simple: Count production deploys from git tags
git tag --list 'release-*' --sort=-creatordate | head -30

# Count deploys in the last 30 days
DEPLOYS=$(git log --oneline --since="30 days ago" \
  --grep="deploy\|release" | wc -l)
echo "Deployment Frequency: $DEPLOYS deploys in last 30 days"

# From CI/CD: Query GitHub Actions completed deployments
gh run list --workflow=deploy.yml --status=success \
  --created=">$(date -d '30 days ago' +%Y-%m-%d)" \
  --json conclusion,createdAt | jq length

2. Lead Time for Changes

# lead_time.py — Calculate lead time from commit to deploy
import subprocess
import json
from datetime import datetime

def get_lead_time(deploy_sha, deploy_time):
    """Calculate time between first commit in PR and deploy."""
    # Get the merge commit's parent (the branch tip)
    result = subprocess.run(
        ["git", "log", "--format=%H %aI", f"{deploy_sha}~1..{deploy_sha}"],
        capture_output=True, text=True
    )

    commits = []
    for line in result.stdout.strip().split('\n'):
        sha, timestamp = line.split(' ', 1)
        commit_time = datetime.fromisoformat(timestamp)
        commits.append(commit_time)

    if commits:
        first_commit = min(commits)
        lead_time = deploy_time - first_commit
        return lead_time.total_seconds() / 3600  # hours
    return None

# Example output:
# PR #142: Lead time = 4.2 hours (commit → production)
# PR #143: Lead time = 18.7 hours
# Median lead time (30 days): 6.5 hours

3. Change Failure Rate

# Track change failures in your incident management
# change_failure_tracking.yml

# Method 1: Tag failed deploys in CI
# After each deploy, track outcome:
deploys:
  total_30_days: 45
  failed_30_days: 3    # Rolled back or caused incident
  change_failure_rate: "6.7%"  # 3/45

# Method 2: Link incidents to deploy SHAs
# In your incident tracker:
incidents:
  - id: INC-2025-089
    caused_by_deploy: "release-2025-12-15-02"
    severity: P2
    description: "Payment API 500 errors after deploy"

  - id: INC-2025-091
    caused_by_deploy: "release-2025-12-18-01"
    severity: P3
    description: "Search latency regression"

4. Time to Restore Service (MTTR)

# Calculate MTTR from PagerDuty/Opsgenie data
# Using PagerDuty API:
curl -s "https://api.pagerduty.com/analytics/metrics/incidents/all" \
  -H "Authorization: Token token=$PD_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "filters": {
      "created_at_start": "2025-11-01T00:00:00Z",
      "created_at_end": "2025-12-01T00:00:00Z"
    },
    "aggregate_unit": "month"
  }' | jq '.data[0].mean_seconds_to_resolve / 3600'

# Output: 2.4 (hours — "High" performer bracket)

Tools for Measuring DORA Metrics

Tool	Type	DORA Support	Pricing
LinearB	Engineering metrics	Full DORA + flow metrics	Free tier available
Sleuth	Deploy tracking	Full DORA, auto-detection	Free for small teams
Faros AI	Engineering intelligence	DORA + custom metrics	Enterprise
Backstage + plugins	Developer portal	DORA via plugins	Open source
Propelo (Harness SEI)	Engineering insights	Full DORA + SPACE	Enterprise
Custom (Prometheus)	DIY	Whatever you build	Free (your time)

Building a Custom Metrics Dashboard

# prometheus-dora-rules.yml
# Custom recording rules for DORA metrics in Prometheus

groups:
  - name: dora_metrics
    interval: 1h
    rules:
      # Deployment Frequency (deploys per day, 30-day rolling)
      - record: dora:deployment_frequency:rate30d
        expr: |
          sum(increase(deployments_total{env="production"}[30d])) / 30

      # Lead Time (median, 30-day rolling) — requires histogram
      - record: dora:lead_time_hours:p50
        expr: |
          histogram_quantile(0.50,
            sum(rate(deploy_lead_time_seconds_bucket{env="production"}[30d]))
            by (le)
          ) / 3600

      # Change Failure Rate (30-day rolling)
      - record: dora:change_failure_rate:ratio30d
        expr: |
          sum(increase(deployments_total{env="production",result="failure"}[30d]))
          /
          sum(increase(deployments_total{env="production"}[30d]))

      # Time to Restore (median, 30-day rolling)
      - record: dora:time_to_restore_hours:p50
        expr: |
          histogram_quantile(0.50,
            sum(rate(incident_resolution_seconds_bucket[30d]))
            by (le)
          ) / 3600

// Grafana dashboard panel — DORA Scorecard
{
  "panels": [
    {
      "title": "DORA Scorecard",
      "type": "table",
      "targets": [
        {
          "expr": "dora:deployment_frequency:rate30d",
          "legendFormat": "Deploy Frequency (per day)"
        },
        {
          "expr": "dora:lead_time_hours:p50",
          "legendFormat": "Lead Time (hours, p50)"
        },
        {
          "expr": "dora:change_failure_rate:ratio30d * 100",
          "legendFormat": "Change Failure Rate (%)"
        },
        {
          "expr": "dora:time_to_restore_hours:p50",
          "legendFormat": "Time to Restore (hours, p50)"
        }
      ],
      "fieldConfig": {
        "overrides": [
          {
            "matcher": { "id": "byName", "options": "Change Failure Rate (%)" },
            "properties": [
              { "id": "thresholds", "value": {
                "steps": [
                  { "color": "green", "value": 0 },
                  { "color": "yellow", "value": 10 },
                  { "color": "red", "value": 15 }
                ]
              }}
            ]
          }
        ]
      }
    }
  ]
}

Beyond DORA: Flow Metrics

DORA tells you how fast and stable your delivery is. Flow metrics tell you whether you are delivering the right things:

Flow Metrics (from Value Stream Management):

Flow Velocity    — How many items completed per unit time?
Flow Efficiency  — Active time / (Active time + Wait time)
Flow Time        — Total time from "started" to "done"
Flow Load        — Work in progress (WIP) at any point
Flow Distribution — % of work across features/defects/debt/risk

Example:
  A team completes 20 items/sprint (velocity looks good)
  But flow efficiency is 15% (items wait 85% of the time)
  And flow distribution is 70% defects, 10% features

  → The team is "productive" but spending most effort on
    rework, and items spend most of their time waiting.
    The real bottleneck is quality and handoff queues.

Vanity Metrics to Avoid

Vanity Metric	Why It Is Misleading	Better Alternative
Lines of code	More code is not better code	Deployment frequency
Number of commits	Encourages micro-commits	Lead time for changes
Story points completed	Inflated over time, not comparable	Flow velocity (items completed)
Test count	More tests does not mean better coverage	Mutation testing score
Uptime percentage (without SLOs)	99.9% means nothing without context	Error budget burn rate
Number of deploys (without CFR)	Fast but breaking everything	Deploy frequency AND change failure rate together
Mean time between failures	Encourages avoiding change	MTTR (recover fast, not fail rarely)

Using Metrics for Improvement, Not Punishment

This is the most important section in this post. Metrics weaponized against teams will destroy your DevOps culture faster than any technical debt:

WRONG: "Team A has a 15% change failure rate. They need to improve."
  → Team A stops deploying frequently to reduce failures.
  → Lead time increases. Batch sizes grow. Failures get WORSE.

RIGHT: "Team A's change failure rate increased from 8% to 15%.
        Let's look at what changed and how we can help."
  → Team A discovers they skipped staging for a deadline.
  → They invest in better test environments.
  → CFR drops to 6%.

Rules for healthy metric usage:
  1. Teams own their metrics — no cross-team comparison
  2. Metrics drive conversations, not consequences
  3. Always look at trends, never snapshots
  4. Pair speed metrics with stability metrics
  5. Celebrate improvement, not absolute numbers

Metric-Driven Retrospectives

# retrospective_template.yml
# Run this monthly with your DORA data

retrospective:
  date: "2025-12-20"
  team: "payments-squad"

  metrics_review:
    deployment_frequency:
      current: "3.2/day"
      previous: "2.8/day"
      trend: "improving"

    lead_time:
      current: "4.5 hours"
      previous: "6.2 hours"
      trend: "improving"
      action: "Parallel test stages reduced CI time by 25 min"

    change_failure_rate:
      current: "12%"
      previous: "8%"
      trend: "degrading"
      action: "3 config-related failures — need config validation in CI"

    time_to_restore:
      current: "45 min"
      previous: "52 min"
      trend: "stable"

  improvement_actions:
    - owner: "Sarah"
      action: "Add config schema validation to CI pipeline"
      due: "2026-01-03"
      expected_impact: "Reduce CFR by ~5%"

    - owner: "Mike"
      action: "Set up automated rollback on error rate spike"
      due: "2026-01-10"
      expected_impact: "Reduce MTTR to < 15 min for deploy failures"

Closing Note

Metrics are a compass, not a scorecard. The four DORA metrics work because they capture the fundamental tension in software delivery: going fast versus staying stable. Elite teams prove these are not trade-offs — they reinforce each other. Start by measuring just one metric accurately, make it visible to your team, and use it to drive one improvement per sprint. The numbers will follow the culture, not the other way around.

The Four DORA Metrics Explained​

Performance Benchmarks​

Measuring Each Metric​

1. Deployment Frequency​

2. Lead Time for Changes​

3. Change Failure Rate​

4. Time to Restore Service (MTTR)​

Tools for Measuring DORA Metrics​

Building a Custom Metrics Dashboard​

Beyond DORA: Flow Metrics​

Vanity Metrics to Avoid​

Using Metrics for Improvement, Not Punishment​

Metric-Driven Retrospectives​

Closing Note​

Stay Updated