Testing in DevOps — Unit, Integration, E2E, and Shift-Left

April 28, 2025 · 10 min read

DevOps & Cloud Learning Hub

Your CI/CD pipeline deploys 15 times a day. It also has zero tests. Every deploy is a coin flip. You find out about bugs when customers tweet angry messages at your company. This is not DevOps — this is chaos with automation.

The Testing Pyramid

The testing pyramid is the foundational mental model for building a balanced test suite. It was introduced by Mike Cohn and tells you how many of each type of test you should write.

         /‾‾‾‾‾\
        /  E2E   \          Few — slow, expensive, high confidence
       /‾‾‾‾‾‾‾‾‾‾\
      / Integration  \      Some — moderate speed, good confidence
     /‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\
    /    Unit Tests      \   Many — fast, cheap, focused
   /‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\

The idea is simple: have a wide base of fast unit tests, a middle layer of integration tests, and a small top layer of E2E tests. Most teams get this inverted — they have tons of slow E2E tests and barely any unit tests. That is the "ice cream cone" anti-pattern, and it will destroy your pipeline speed.

Test Type	Speed	Confidence	Cost	Quantity
Unit	~1ms each	Low-Medium	Low	Hundreds-Thousands
Integration	~100ms-5s each	Medium-High	Medium	Dozens-Hundreds
E2E	~10s-60s each	High	High	Tens
Manual	Minutes-Hours	Very High	Very High	As few as possible

Shift-Left Testing: Find Bugs Earlier, Fix Them Cheaper

"Shift-left" means moving testing earlier in the development lifecycle. A bug caught in development costs $1 to fix. The same bug caught in production costs $100. The math is obvious.

Traditional (find bugs late):
Plan → Code → Build → Test → Deploy → BUGS FOUND HERE 💥

Shift-Left (find bugs early):
Plan → Code+Test → Build+Test → Deploy+Test → Monitor
         ↑            ↑            ↑
    unit tests    integration    smoke tests
    linting       contract       synthetic
    SAST          API tests      monitoring

Shift-left in practice means:

Developers write tests alongside code, not after
Linters and static analysis run on every commit
Security scanning (SAST) happens in CI, not before release
Infrastructure tests run before Terraform apply, not after

Unit Testing: The Foundation

Unit tests verify individual functions or methods in isolation. They should be fast, deterministic, and independent.

// Jest example — testing a price calculator
// src/pricing.js
function calculateTotal(items, discountCode) {
  const subtotal = items.reduce((sum, item) => sum + item.price * item.qty, 0);

  const discounts = { SAVE10: 0.10, SAVE20: 0.20, HALF: 0.50 };
  const discount = discounts[discountCode] || 0;

  const total = subtotal * (1 - discount);
  return Math.round(total * 100) / 100;  // Round to 2 decimal places
}

// src/pricing.test.js
describe('calculateTotal', () => {
  test('calculates subtotal correctly', () => {
    const items = [
      { price: 10.00, qty: 2 },
      { price: 5.50, qty: 1 },
    ];
    expect(calculateTotal(items)).toBe(25.50);
  });

  test('applies discount code', () => {
    const items = [{ price: 100, qty: 1 }];
    expect(calculateTotal(items, 'SAVE10')).toBe(90.00);
    expect(calculateTotal(items, 'SAVE20')).toBe(80.00);
  });

  test('handles invalid discount code', () => {
    const items = [{ price: 100, qty: 1 }];
    expect(calculateTotal(items, 'INVALID')).toBe(100.00);
  });

  test('handles empty cart', () => {
    expect(calculateTotal([])).toBe(0);
  });

  test('rounds to 2 decimal places', () => {
    const items = [{ price: 10.33, qty: 3 }];
    expect(calculateTotal(items, 'SAVE10')).toBe(27.89);
  });
});

# pytest example — testing a user service
# test_user_service.py
import pytest
from user_service import create_user, validate_email

class TestValidateEmail:
    def test_valid_email(self):
        assert validate_email("user@example.com") is True

    def test_invalid_email_no_at(self):
        assert validate_email("userexample.com") is False

    def test_invalid_email_no_domain(self):
        assert validate_email("user@") is False

    @pytest.mark.parametrize("email,expected", [
        ("test@gmail.com", True),
        ("test@.com", False),
        ("@gmail.com", False),
        ("test@sub.domain.com", True),
    ])
    def test_email_variations(self, email, expected):
        assert validate_email(email) is expected

class TestCreateUser:
    def test_creates_user_with_valid_data(self):
        user = create_user("alice", "alice@example.com")
        assert user.name == "alice"
        assert user.email == "alice@example.com"
        assert user.id is not None

    def test_rejects_duplicate_email(self):
        create_user("alice", "alice@example.com")
        with pytest.raises(ValueError, match="Email already exists"):
            create_user("bob", "alice@example.com")

Integration Tests with Docker

Integration tests verify that components work together — your app talks to a real database, a real cache, a real message queue.

# docker-compose.test.yml
version: "3.8"

services:
  app:
    build: .
    environment:
      DATABASE_URL: postgres://test:test@postgres:5432/testdb
      REDIS_URL: redis://redis:6379
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_started

  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: testdb
      POSTGRES_USER: test
      POSTGRES_PASSWORD: test
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U test"]
      interval: 2s
      timeout: 5s
      retries: 10

  redis:
    image: redis:7-alpine

# Run integration tests with real dependencies
docker compose -f docker-compose.test.yml up --build --abort-on-container-exit

# Or use Testcontainers (spins up containers programmatically)
# Available for Java, Node.js, Python, Go, .NET

// Integration test with Testcontainers (Node.js)
const { PostgreSqlContainer } = require('@testcontainers/postgresql');

describe('User Repository', () => {
  let container;
  let db;

  beforeAll(async () => {
    container = await new PostgreSqlContainer('postgres:16-alpine').start();
    db = createConnection(container.getConnectionUri());
    await db.migrate();
  }, 60000);

  afterAll(async () => {
    await db.close();
    await container.stop();
  });

  test('saves and retrieves a user', async () => {
    await db.query("INSERT INTO users (name, email) VALUES ('Alice', 'alice@test.com')");
    const result = await db.query("SELECT * FROM users WHERE email = 'alice@test.com'");
    expect(result.rows[0].name).toBe('Alice');
  });
});

E2E Testing: The Confidence Layer

End-to-end tests simulate real user behavior in a real browser. They are slow and sometimes flaky, but they catch the bugs that unit tests cannot.

// Playwright E2E test
const { test, expect } = require('@playwright/test');

test.describe('User Login Flow', () => {
  test('successful login redirects to dashboard', async ({ page }) => {
    await page.goto('/login');

    await page.fill('[data-testid="email-input"]', 'user@example.com');
    await page.fill('[data-testid="password-input"]', 'securePassword123');
    await page.click('[data-testid="login-button"]');

    // Wait for navigation
    await expect(page).toHaveURL('/dashboard');
    await expect(page.locator('[data-testid="welcome-message"]'))
      .toContainText('Welcome back');
  });

  test('invalid credentials show error', async ({ page }) => {
    await page.goto('/login');

    await page.fill('[data-testid="email-input"]', 'user@example.com');
    await page.fill('[data-testid="password-input"]', 'wrongpassword');
    await page.click('[data-testid="login-button"]');

    await expect(page.locator('[data-testid="error-message"]'))
      .toContainText('Invalid email or password');
    await expect(page).toHaveURL('/login');
  });
});

# Run Playwright tests
npx playwright test

# Run with a visible browser (headed mode)
npx playwright test --headed

# Run a specific test file
npx playwright test tests/login.spec.ts

# Generate test report
npx playwright show-report

Contract Testing

Contract testing verifies that services agree on the shape of their API without requiring them to be running simultaneously. This is critical for microservices.

// Consumer-side contract (what the frontend expects from the API)
const { Pact } = require('@pact-foundation/pact');

describe('User API Contract', () => {
  const provider = new Pact({
    consumer: 'WebFrontend',
    provider: 'UserService',
  });

  test('get user by ID', async () => {
    await provider.addInteraction({
      state: 'user with ID 1 exists',
      uponReceiving: 'a request for user 1',
      withRequest: {
        method: 'GET',
        path: '/api/users/1',
      },
      willRespondWith: {
        status: 200,
        body: {
          id: 1,
          name: 'Alice',
          email: 'alice@example.com',
        },
      },
    });

    // Test your client code against the mock
    const user = await userClient.getUser(1);
    expect(user.name).toBe('Alice');
  });
});

Handling Flaky Tests

Flaky tests — tests that sometimes pass and sometimes fail without code changes — are a DevOps team's worst enemy. They erode trust in your test suite.

# Common causes and fixes for flaky tests:

# 1. Timing issues — use explicit waits, not sleep
# BAD:
#   await sleep(3000);
# GOOD:
#   await page.waitForSelector('[data-testid="loaded"]');

# 2. Test order dependency — each test must set up its own state
# BAD:
#   test('delete user') — depends on 'create user' running first
# GOOD:
#   beforeEach(() => createTestUser());

# 3. Shared state — use test isolation
# BAD:
#   All tests share one database
# GOOD:
#   Each test gets a transaction that rolls back after

# Quarantine strategy in CI:
# .github/workflows/ci.yml
name: CI
on: [push]
jobs:
  tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm ci

      # Run stable tests (must all pass)
      - name: Stable tests
        run: npm test -- --ignore="**/*.flaky.test.*"

      # Run quarantined tests (failures don't block)
      - name: Quarantined tests
        run: npm test -- --testPathPattern="flaky" || true
        continue-on-error: true

Testing Infrastructure Code

Your Terraform modules and Ansible playbooks are code too. Test them.

// Terratest example (Go) — testing a Terraform module
package test

import (
    "testing"
    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/gruntwork-io/terratest/modules/aws"
    "github.com/stretchr/testify/assert"
)

func TestS3BucketModule(t *testing.T) {
    t.Parallel()

    terraformOptions := &terraform.Options{
        TerraformDir: "../modules/s3-bucket",
        Vars: map[string]interface{}{
            "bucket_name": "test-bucket-" + random.UniqueId(),
            "environment": "test",
        },
    }

    // Clean up after test
    defer terraform.Destroy(t, terraformOptions)

    // Deploy the infrastructure
    terraform.InitAndApply(t, terraformOptions)

    // Get the bucket name from Terraform output
    bucketName := terraform.Output(t, terraformOptions, "bucket_name")

    // Verify the bucket exists
    aws.AssertS3BucketExists(t, "us-east-1", bucketName)

    // Verify bucket is private
    bucketPolicy := aws.GetS3BucketPolicy(t, "us-east-1", bucketName)
    assert.Contains(t, bucketPolicy, "Deny")
}

Pipeline Example: Complete Test Stages

Here is a real-world CI/CD pipeline with proper test stages:

# .github/workflows/ci.yml
name: CI Pipeline

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

concurrency:
  group: ci-${{ github.ref }}
  cancel-in-progress: true

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - run: npm ci
      - run: npm run lint
      - run: npx tsc --noEmit          # Type checking

  unit-tests:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - run: npm ci
      - run: npm test -- --coverage
      - name: Upload coverage
        uses: actions/upload-artifact@v4
        with:
          name: coverage-report
          path: coverage/

  integration-tests:
    runs-on: ubuntu-latest
    services:
      postgres:
        image: postgres:16-alpine
        env:
          POSTGRES_DB: testdb
          POSTGRES_USER: test
          POSTGRES_PASSWORD: test
        ports: ['5432:5432']
        options: >-
          --health-cmd pg_isready --health-interval 10s
          --health-timeout 5s --health-retries 5
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - run: npm ci
      - run: npm run test:integration
        env:
          DATABASE_URL: postgres://test:test@localhost:5432/testdb

  e2e-tests:
    runs-on: ubuntu-latest
    needs: [lint, unit-tests]       # Only run after fast checks pass
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - run: npm ci
      - run: npx playwright install --with-deps
      - run: npm run test:e2e
      - uses: actions/upload-artifact@v4
        if: failure()
        with:
          name: playwright-report
          path: playwright-report/

  deploy:
    runs-on: ubuntu-latest
    needs: [lint, unit-tests, integration-tests, e2e-tests]
    if: github.ref == 'refs/heads/main'
    environment: production
    steps:
      - uses: actions/checkout@v4
      - run: npm ci && npm run build
      - name: Deploy
        run: ./scripts/deploy.sh production

      # Smoke test after deploy
      - name: Smoke test
        run: |
          sleep 10
          STATUS=$(curl -s -o /dev/null -w "%{http_code}" https://app.example.com/health)
          if [ "$STATUS" != "200" ]; then
            echo "Smoke test failed! Status: $STATUS"
            exit 1
          fi
          echo "Smoke test passed!"

Notice the structure: fast checks first (lint, unit tests), then slower checks (integration, E2E), and deploy only if everything passes. The E2E tests have a needs dependency on lint and unit tests — no point running expensive browser tests if the code does not even compile.

Test Coverage: The Nuance

Coverage metrics tell you what percentage of your code is executed by tests. But 100% coverage does not mean zero bugs.

# Generate coverage report
npm test -- --coverage

# Enforce minimum coverage in CI
npm test -- --coverage --coverageThreshold='{
  "global": {
    "branches": 80,
    "functions": 85,
    "lines": 85,
    "statements": 85
  }
}'

Aim for 80-90% line coverage. Below 80% means significant code paths are untested. Above 95% usually means you are writing pointless tests for getters and setters. Cover the logic, not the boilerplate.

Testing is not a phase — it is a practice woven into every stage of your pipeline. The teams that ship fastest are not the ones that skip tests. They are the ones that have fast, reliable tests running at every step.

This wraps up our DevOps foundations series. In the next set of posts, we will dive into containers, starting with Docker fundamentals — building, running, and debugging containers like a pro.

The Testing Pyramid​

Shift-Left Testing: Find Bugs Earlier, Fix Them Cheaper​

Unit Testing: The Foundation​

Integration Tests with Docker​

E2E Testing: The Confidence Layer​

Contract Testing​

Handling Flaky Tests​

Testing Infrastructure Code​

Pipeline Example: Complete Test Stages​

Test Coverage: The Nuance​

Stay Updated