Skip to main content

The Complete DevOps Learning Roadmap — From Zero to SRE

· 12 min read
Goel Academy
DevOps & Cloud Learning Hub

This is the final post in our 30-part DevOps series. Over the past year, we have covered everything from CI/CD pipelines to chaos engineering, from YAML syntax to supply chain security. This post ties it all together into a structured 12-month learning plan with clear milestones, certification paths, and career guidance.

The 12-Month Plan at a Glance

Month  1-2:  Linux + Networking          (Foundation)
Month 3-4: Git + CI/CD (Delivery)
Month 5-6: Containers + Kubernetes (Runtime)
Month 7-8: IaC + Cloud (Infrastructure)
Month 9-10: Monitoring + Security (Operations)
Month 11-12: SRE + Platform Engineering (Mastery)

Foundation ──► Delivery ──► Runtime ──► Infrastructure

Mastery ◄── Operations

Month 1-2: Linux and Networking

Every server you will manage, every container you will debug, and every pipeline you will fix runs on Linux. This is non-negotiable.

# Skills to master:
# File system navigation and permissions
ls -la /etc/nginx/
chmod 755 script.sh
chown www-data:www-data /var/www/

# Process management
ps aux | grep nginx
systemctl status nginx
journalctl -u nginx -f

# Networking fundamentals
ss -tlnp # What is listening on what port?
curl -I https://example.com # HTTP headers
dig example.com # DNS resolution
traceroute example.com # Network path
ip addr show # Network interfaces

# Text processing (you will use these daily)
grep -r "ERROR" /var/log/
awk '{print $1, $4}' access.log
tail -f /var/log/syslog | grep --line-buffered "error"

What to learn:

  • Linux file system hierarchy, permissions, users/groups
  • Package management (apt, yum/dnf)
  • Bash scripting fundamentals
  • Networking: TCP/UDP, DNS, HTTP/HTTPS, SSH, firewalls
  • Process management, systemd, cron

Milestones:

  • SSH into a remote server and configure Nginx from scratch
  • Write a bash script that monitors disk usage and sends alerts
  • Explain the difference between TCP and UDP without Googling

Certification: Linux Foundation Certified System Administrator (LFCS) or CompTIA Linux+


Month 3-4: Git and CI/CD

Version control and automated pipelines are the heart of DevOps. Master these and you can contribute to any team.

# Your first GitHub Actions pipeline
# .github/workflows/ci.yml
name: CI Pipeline
on:
push:
branches: [main]
pull_request:

jobs:
build-and-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
- run: npm run lint
- run: npm test
- run: npm run build
- uses: actions/upload-artifact@v4
with:
name: build
path: dist/

What to learn:

  • Git fundamentals: branches, merges, rebases, cherry-picks
  • Branching strategies: trunk-based development, GitFlow, GitHub Flow
  • CI/CD concepts: build, test, deploy stages
  • At least two CI/CD tools: GitHub Actions + Jenkins (or GitLab CI)
  • YAML syntax and best practices
  • Testing in CI: unit, integration, linting, security scans
  • Artifact management basics

Series references:

Milestones:

  • Set up a CI pipeline that runs tests on every PR
  • Implement a CD pipeline that deploys to a staging environment
  • Resolve a merge conflict in a real team workflow

Month 5-6: Containers and Kubernetes

Containers are how modern applications are packaged and shipped. Kubernetes is how they are orchestrated at scale.

# Multi-stage Dockerfile (production-ready)
FROM node:20-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:20-alpine AS runtime
WORKDIR /app
RUN addgroup -S app && adduser -S app -G app
COPY --from=build /app/dist ./dist
COPY --from=build /app/node_modules ./node_modules
USER app
EXPOSE 3000
CMD ["node", "dist/server.js"]
# Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: myorg/api:v1.2.3
ports:
- containerPort: 3000
readinessProbe:
httpGet:
path: /healthz
port: 3000
initialDelaySeconds: 5
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"

What to learn:

  • Docker: images, containers, Dockerfiles, multi-stage builds, volumes, networking
  • Docker Compose for local development
  • Kubernetes architecture: control plane, nodes, etcd
  • K8s workloads: Pods, Deployments, StatefulSets, DaemonSets, Jobs
  • K8s networking: Services, Ingress, Network Policies
  • K8s storage: PersistentVolumes, StorageClasses
  • Helm for package management

Milestones:

  • Containerize a multi-service application
  • Deploy it to a Kubernetes cluster (minikube or kind locally, then EKS/AKS/GKE)
  • Scale it up, roll out an update, and roll back

Certification: Certified Kubernetes Administrator (CKA)


Month 7-8: Infrastructure as Code and Cloud

IaC makes infrastructure reproducible, reviewable, and versionable. Pick one major cloud and go deep.

# Terraform: Deploy a production-ready VPC + EKS cluster
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "5.0"

name = "production"
cidr = "10.0.0.0/16"

azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
private_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
public_subnets = ["10.0.101.0/24", "10.0.102.0/24", "10.0.103.0/24"]

enable_nat_gateway = true
single_nat_gateway = true
}

module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "20.0"

cluster_name = "production"
cluster_version = "1.29"

vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.private_subnets

eks_managed_node_groups = {
general = {
instance_types = ["m5.xlarge"]
min_size = 2
max_size = 10
desired_size = 3
}
}
}

What to learn:

  • Terraform: HCL, state management, modules, workspaces
  • At least one cloud deeply: AWS (VPC, EC2, EKS, S3, IAM, RDS) or Azure (VNet, AKS, Blob, AAD)
  • Configuration management basics (Ansible)
  • GitOps: ArgoCD or Flux for Kubernetes
  • Secrets management: Vault, cloud-native secret stores

Series references:

Milestones:

  • Provision a full environment (VPC + K8s + database) with Terraform
  • Implement GitOps: changes merged to main auto-deploy to the cluster
  • Destroy and recreate the entire environment from code in under 30 minutes

Certification: HashiCorp Certified: Terraform Associate, AWS Solutions Architect Associate (or Azure AZ-104)


Month 9-10: Monitoring, Observability, and Security

You cannot improve what you cannot see. And you cannot ship safely without security built in.

# Prometheus + Grafana monitoring stack
# prometheus-values.yml (Helm)
prometheus:
prometheusSpec:
retention: 30d
resources:
requests:
cpu: "500m"
memory: "2Gi"
serviceMonitorSelector:
matchLabels:
team: platform

alertmanager:
config:
receivers:
- name: slack-critical
slack_configs:
- channel: '#alerts-critical'
send_resolved: true
route:
group_by: ['alertname', 'namespace']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: slack-critical

grafana:
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'default'
folder: 'DevOps'
type: file
options:
path: /var/lib/grafana/dashboards

What to learn:

  • Monitoring: Prometheus, Grafana, alerting strategies
  • Observability: distributed tracing (Jaeger, OpenTelemetry), structured logging
  • DORA metrics and how to measure them
  • Incident management: on-call, escalation, blameless post-mortems
  • Security: DevSecOps, SAST/DAST, container scanning, SBOM
  • Supply chain security: image signing, SLSA framework
  • Network security: mTLS, network policies, zero-trust

Series references:

Milestones:

  • Deploy a full monitoring stack and create dashboards for a production app
  • Set up alerting with escalation policies
  • Run a security scan pipeline that catches a real vulnerability
  • Conduct a blameless post-mortem for a simulated incident

Month 11-12: SRE, Platform Engineering, and Advanced Topics

This is where you go from "I can use the tools" to "I can design the systems."

SRE Principles to Internalize:

1. Embrace risk → Error budgets, not zero-defect targets
2. SLOs drive decisions → "Is the user happy?" not "Is the server healthy?"
3. Eliminate toil → If you do it twice, automate it
4. Monitor symptoms → Alert on user-facing impact, not CPU spikes
5. Release engineering → Every release is safe, fast, and repeatable
6. Simplicity → Boring technology > cutting-edge fragility

What to learn:

  • SRE principles: error budgets, SLOs, toil reduction
  • Platform engineering: internal developer platforms, golden paths, Backstage
  • Chaos engineering: controlled failure injection, GameDays
  • Advanced deployment: canary, blue-green, feature flags, progressive delivery
  • API gateways: Kong, Traefik, cloud-native gateways
  • MLOps basics: ML pipelines, model serving, AIOps
  • Career growth: leadership, communication, architecture thinking

Series references:

Milestones:

  • Define SLOs for a real service and track error budget burn
  • Build a self-service deployment pipeline (developer pushes code, everything else is automated)
  • Run a chaos engineering experiment and discover a real weakness
  • Mentor someone at Month 1-2 of their journey

Certification: Certified Kubernetes Security Specialist (CKS), Google Professional Cloud DevOps Engineer


Certification Roadmap

Timeline    Certification                           Provider
─────────────────────────────────────────────────────────────
Month 2 CompTIA Linux+ / LFCS CompTIA / Linux Foundation
Month 4 GitHub Actions Certification GitHub
Month 6 CKA (Certified Kubernetes Admin) CNCF
Month 8 Terraform Associate HashiCorp
Month 8 AWS SAA / Azure AZ-104 AWS / Microsoft
Month 10 AWS DevOps Professional / AZ-400 AWS / Microsoft
Month 12 CKS (Certified K8s Security) CNCF

Optional (advanced):
- Google Professional Cloud DevOps Engineer
- AWS DevOps Professional
- Certified GitOps Associate (CGOA)
- Prometheus Certified Associate (PCA)

Skills Matrix

Rate yourself 1-5 across each skill. Revisit quarterly.

Skill Area                    Beginner  Intermediate  Advanced
──────────────────────────────────────────────────────────────
Linux administration □ □ □
Networking (TCP/IP, DNS, HTTP) □ □ □
Git & version control □ □ □
CI/CD pipelines □ □ □
Docker & containers □ □ □
Kubernetes □ □ □
Terraform / IaC □ □ □
Cloud (AWS or Azure or GCP) □ □ □
Monitoring & observability □ □ □
Security (DevSecOps) □ □ □
Incident management □ □ □
SRE practices □ □ □
Platform engineering □ □ □
Scripting (Bash + Python) □ □ □

Career Paths

DevOps Engineer

├──► Senior DevOps Engineer
│ │
│ ├──► Staff DevOps Engineer ──► Principal Engineer
│ │
│ ├──► SRE (Site Reliability Engineer)
│ │ │
│ │ └──► Staff SRE ──► SRE Manager
│ │
│ ├──► Platform Engineer
│ │ │
│ │ └──► Staff Platform Engineer ──► Head of Platform
│ │
│ └──► Cloud Architect
│ │
│ └──► Principal Cloud Architect ──► CTO

└──► DevOps Manager ──► Director of Engineering

Typical Salary Ranges (US, 2025):
Junior DevOps: $80K - $120K
Mid-Level DevOps: $120K - $170K
Senior DevOps/SRE: $160K - $220K
Staff/Principal: $200K - $300K+
(Ranges vary significantly by location, company size, and industry)

The Complete Series Reference

All 30 posts in this DevOps series, organized by topic:

#PostTopics
1CI/CD Pipelines: Building Your First Automated PipelineCI/CD fundamentals
2DevOps Is Not a Tool — Culture, CALMS, and the Three WaysCulture, CALMS
3Git Workflows — Trunk-Based vs GitFlow vs GitHub FlowGit, branching
4GitHub Actions from ScratchCI/CD, GitHub
5Jenkins Pipeline — Declarative, Scripted, and Blue OceanCI/CD, Jenkins
6YAML for DevOps — The Complete GuideYAML
7Version Control Best PracticesGit, code review
8Testing in DevOps — Unit, Integration, E2E, and Shift-LeftTesting
9Monitoring 101 — Metrics, Logs, Traces, and the Golden SignalsMonitoring
10Prometheus and Grafana — Production Monitoring in 15 MinutesMonitoring
11Artifact Management — JFrog, Nexus, and Container RegistriesArtifacts
12Configuration Management — Ansible, Chef, and PuppetConfig mgmt
13GitOps — ArgoCD, Flux, and Git as Source of TruthGitOps
14SRE Principles — Error Budgets, SLOs, and ToilSRE
15Incident Management — On-Call, Escalation, and Post-MortemsIncidents
16Secrets Management — Vault, SOPS, and Sealed SecretsSecurity
17Platform Engineering — Internal Developer Platforms ExplainedPlatform
18Chaos Engineering — Break Your System Before It Breaks YouChaos
19DevSecOps — Shift Security Left Without Slowing DownSecurity
20Observability vs Monitoring — Distributed TracingObservability
21Deployment Strategies — Blue-Green, Canary, RollingDeployment
22Infrastructure Testing — Terratest, InSpec, ServerSpecTesting, IaC
23API Gateways — Kong, Traefik, and AWS API GatewayNetworking
24Supply Chain Security — SBOM, Sigstore, and SLSASecurity
25DevOps Maturity Model — Where Is Your Organization?Assessment
26DevOps Metrics That Matter — DORA and BeyondMetrics
27Multi-Cloud DevOps — Terraform, K8s, Cross-Cloud CI/CDMulti-cloud
28MLOps and AIOps — DevOps for Machine LearningMLOps, AIOps
29Top 50 DevOps Interview QuestionsCareer
30The Complete DevOps Roadmap — From Zero to SREThis post

Closing Note

A year ago, this series started with a simple CI/CD pipeline. Thirty posts later, we have covered the entire DevOps landscape — from culture and tooling to SRE principles and machine learning operations. But the most important thing is not what you have read. It is what you build next. Pick Month 1 of the roadmap, open a terminal, and start. The DevOps community is welcoming, the tools are free, and the career opportunities are extraordinary. Every senior engineer you admire started exactly where you are now — with a blank terminal and curiosity. Go build something.