Skip to main content

DevOps Is Not a Tool — Culture, CALMS, and the Three Ways

· 8 min read
Goel Academy
DevOps & Cloud Learning Hub

Your company just posted a job listing for a "DevOps Engineer" who must know Jenkins, Terraform, Kubernetes, and 47 other tools. Congratulations — you have completely missed the point of DevOps.

The Origin Story: How a Frustrated Belgian Changed Everything

In 2008, Andrew Shafer proposed a talk called "Agile Infrastructure" at the Agile Conference in Toronto. Almost nobody showed up. But one person did — Patrick Debois, a Belgian IT consultant who was exhausted by the wall between dev and ops teams.

Debois had been working on a large data center migration. The developers would throw code over the wall, operations would scramble to deploy it, things would break at 2 AM, and everyone would blame each other. Sound familiar?

In 2009, inspired by a talk by John Allspaw and Paul Hammond at Flickr titled "10+ Deploys Per Day," Debois organized the first DevOpsDays conference in Ghent, Belgium. The Twitter hashtag #DevOps was born, and the rest is history.

But here is the critical thing everyone forgets: DevOps was never about tools. It was about breaking down silos between teams.

What DevOps Actually Is

DevOps is a cultural and professional movement that emphasizes collaboration between software developers and IT operations. It aims to shorten the systems development lifecycle while delivering features, fixes, and updates frequently and reliably.

Let me be blunt about what DevOps is NOT:

# DevOps is NOT:
# ❌ A job title
# ❌ A team name
# ❌ A tool or product you can buy
# ❌ Just automation
# ❌ Just CI/CD

# DevOps IS:
# ✅ A culture of shared responsibility
# ✅ Breaking down silos between teams
# ✅ Continuous improvement
# ✅ Automating everything you can
# ✅ Measuring what matters

The CALMS Framework

Jez Humble (co-author of Continuous Delivery) proposed the CALMS framework to assess whether an organization is truly adopting DevOps or just slapping a new label on the same old problems.

PillarWhat It MeansRed Flag If Missing
CultureShared responsibility, blameless postmortems, trust between teamsFinger-pointing after outages, "that's not my job" mentality
AutomationAutomate builds, tests, deployments, infrastructure provisioningManual deployments, "works on my machine" syndrome
LeanLimit WIP, eliminate waste, value stream mapping, small batch sizesHuge releases every quarter, features nobody asked for
MeasurementTrack deployment frequency, lead time, MTTR, change failure rateNo metrics, gut-feel decision making, vanity dashboards
SharingKnowledge sharing, cross-functional teams, open communicationKnowledge hoarded by individuals, tribal knowledge

Here is a quick self-assessment you can run with your team:

# Quick CALMS Self-Assessment
# Rate each pillar 1-5. Be honest.

echo "=== CALMS Assessment ==="
echo "Culture: Do devs and ops share on-call? (1-5)"
echo "Automation: Can you deploy with one command? (1-5)"
echo "Lean: Is your WIP limit defined and enforced? (1-5)"
echo "Measurement: Do you track DORA metrics? (1-5)"
echo "Sharing: Does your team do blameless postmortems? (1-5)"
echo ""
echo "Score 20+: You are doing DevOps"
echo "Score 15-19: Getting there"
echo "Score <15: You have a DevOps title, not DevOps culture"

The Three Ways of DevOps

Gene Kim's book The Phoenix Project introduced The Three Ways, which are foundational principles for DevOps.

The First Way: Flow (Systems Thinking)

Optimize the entire system, not individual silos. Work should flow left-to-right from Dev to Ops to the customer with minimal friction.

# The First Way in practice: a deployment pipeline
# Work flows continuously from commit to production

pipeline:
- stage: commit
action: "Developer pushes code"
- stage: build
action: "Automated build triggers"
- stage: test
action: "Unit + integration tests run"
- stage: staging
action: "Deploy to staging automatically"
- stage: production
action: "One-click deploy to production"

# Key principle: Make work visible, limit WIP,
# reduce batch sizes, eliminate bottlenecks

The Second Way: Feedback (Amplify Feedback Loops)

Create fast, constant feedback from right-to-left. When production breaks, the team that built it should know immediately — not three weeks later from a customer complaint.

# The Second Way in practice: fast feedback

# Monitoring that alerts the right people
# Instead of: ops gets paged, opens ticket, waits 3 days
# Do this: the team that deployed gets immediate feedback

# Example: Set up alerting that goes to the dev team
curl -X POST https://api.pagerduty.com/incidents \
-H "Content-Type: application/json" \
-d '{
"incident": {
"type": "incident",
"title": "High error rate after deploy v2.3.1",
"service": {
"id": "SERVICE_ID",
"type": "service_reference"
},
"urgency": "high"
}
}'

The Third Way: Continuous Learning (Experimentation and Learning)

Foster a culture of experimentation and learning from failure. Allocate time for improvement, run game days, and conduct blameless postmortems.

# Blameless Postmortem Template (The Third Way)

## Incident Summary
- **Date:** 2025-02-20
- **Duration:** 45 minutes
- **Impact:** 12% of users saw 500 errors on checkout

## Timeline
- 14:02 — Deploy v2.3.1 rolled out
- 14:15 — Monitoring alert fired (error rate > 5%)
- 14:18 — On-call engineer acknowledged
- 14:32 — Root cause identified (DB connection pool exhaustion)
- 14:47 — Rollback completed, errors resolved

## Root Cause
Connection pool sized for 50 connections, new feature opened 3x more connections per request.

## Action Items
- [ ] Add connection pool metrics to dashboard
- [ ] Load test new features before deploy
- [ ] Add circuit breaker for DB connections

## What Went Well
- Alert fired within 13 minutes
- Rollback was one command

## What We Learned
- Need better capacity planning for DB-heavy features

Anti-Patterns: You Are Doing DevOps Wrong If...

Watch out for these common traps that organizations fall into:

1. Renaming Ops to DevOps. You took your operations team, changed their title to "DevOps Engineers," and declared victory. Nothing else changed. Developers still throw code over the wall.

2. The DevOps Team silo. You created a new "DevOps Team" that sits between Dev and Ops. You just added a third silo. Congratulations.

3. Tool worship. "We use Kubernetes, so we do DevOps." No. You can run Kubernetes and still have a terrible deployment process with zero collaboration.

4. Automation without culture change. You automated deployments but developers are still not allowed to deploy to production. That is just faster waterfall.

# The anti-pattern detector
# If any of these are true, you have a problem:

echo "Anti-Pattern Checklist:"
echo "[ ] Devs cannot deploy their own code"
echo "[ ] Only one person knows how the pipeline works"
echo "[ ] Postmortems assign blame to individuals"
echo "[ ] 'DevOps' is a separate team, not a practice"
echo "[ ] Release day is still a stressful event"
echo "[ ] You measure lines of code instead of outcomes"

DevOps vs SRE vs Platform Engineering

These three approaches are related but distinct. Here is how they compare:

AspectDevOpsSREPlatform Engineering
OriginPatrick Debois, 2009Google, 2003Evolution of DevOps, ~2020
FocusCulture + collaborationReliability + engineeringDeveloper experience + self-service
Key MetricDeployment frequencyError budgets, SLOsDeveloper productivity
Who Does ItEveryone (culture shift)Dedicated SRE teamPlatform team
ApproachBreak down silos"Class SRE implements DevOps"Build internal platforms
ToolsCI/CD, IaC, monitoringSLI/SLO frameworks, toil trackingInternal developer portals (Backstage)
Risk ModelMove fast, iterateError budgets allow controlled riskGolden paths reduce risk

As Ben Treynor (VP Engineering at Google) said: "SRE is what happens when you ask a software engineer to design an operations function."

DORA Metrics: Measuring What Matters

The DevOps Research and Assessment (DORA) team identified four key metrics that predict software delivery performance:

# The Four DORA Metrics

# 1. Deployment Frequency
# How often do you deploy to production?
# Elite: Multiple times per day | Low: Less than once per month

# 2. Lead Time for Changes
# Time from commit to production
# Elite: Less than 1 hour | Low: More than 6 months

# 3. Change Failure Rate
# % of deployments causing a failure
# Elite: 0-15% | Low: 46-60%

# 4. Mean Time to Restore (MTTR)
# How long to recover from a failure
# Elite: Less than 1 hour | Low: More than 6 months

# Check your team's performance:
echo "=== DORA Metrics Quick Check ==="
echo "Deployment Frequency: ______ (daily/weekly/monthly/quarterly)"
echo "Lead Time for Changes: ______ (hours/days/weeks/months)"
echo "Change Failure Rate: ______% "
echo "MTTR: ______ (minutes/hours/days/weeks)"

These metrics are not vanity numbers. Google's research across thousands of organizations shows that elite performers have 208x more frequent deployments and 106x faster lead times compared to low performers. And they are more stable, not less.

Where to Start

If you are reading this and thinking "my organization does none of this," do not panic. Start small:

  1. Pick one CALMS pillar and improve it this quarter
  2. Measure your DORA metrics — you cannot improve what you do not measure
  3. Run a blameless postmortem after your next incident
  4. Automate one manual step in your deployment process
  5. Share knowledge — write a runbook, do a lunch-and-learn

DevOps is a journey, not a destination. The organizations that win are the ones that never stop improving.


Next up in our DevOps series, we will dive into Git Workflows — comparing Trunk-Based Development, GitFlow, and GitHub Flow to help you pick the right branching strategy for your team.