Skip to main content

Terraform in CI/CD — Automated Plan, Apply, and Drift Detection

· 7 min read
Goel Academy
DevOps & Cloud Learning Hub

Running terraform apply from your laptop works fine when you are the only engineer. The moment a second person joins the team, you need a pipeline. CI/CD for Terraform ensures every change is reviewed, planned, and applied through a consistent process — no more "I ran apply from my machine and forgot to commit the code."

Why Automate Terraform?

Manual Terraform workflows break in predictable ways:

  1. Forgotten plans — someone applies without running plan first, and a resource gets destroyed.
  2. Stale state — two engineers apply concurrently, and the state file gets corrupted.
  3. No audit trail — nobody knows who changed what, because applies happen from laptops.
  4. Drift goes unnoticed — someone clicks in the console, and your code no longer matches reality.

A CI/CD pipeline solves all four. Plan runs automatically on every pull request, apply only runs after merge with approval, and scheduled runs catch drift before it causes incidents.

GitHub Actions — Plan on PR, Apply on Merge

This is the most common Terraform CI/CD pattern. The workflow runs plan on pull requests and posts the output as a comment, then runs apply when the PR merges to main.

# .github/workflows/terraform.yml
name: Terraform

on:
pull_request:
branches: [main]
paths: ['infra/**']
push:
branches: [main]
paths: ['infra/**']

env:
TF_VAR_environment: production
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

jobs:
plan:
if: github.event_name == 'pull_request'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.7.0

- name: Terraform Init
working-directory: infra
run: terraform init -input=false

- name: Terraform Plan
id: plan
working-directory: infra
run: terraform plan -no-color -out=tfplan
continue-on-error: true

- name: Save Plan Artifact
uses: actions/upload-artifact@v4
with:
name: tfplan-${{ github.event.pull_request.number }}
path: infra/tfplan

- name: Comment Plan on PR
uses: actions/github-script@v7
with:
script: |
const output = `${{ steps.plan.outputs.stdout }}`;
const truncated = output.length > 60000
? output.substring(0, 60000) + '\n... (truncated)'
: output;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `### Terraform Plan\n\`\`\`\n${truncated}\n\`\`\``
});

apply:
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4

- uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.7.0

- name: Terraform Init
working-directory: infra
run: terraform init -input=false

- name: Terraform Apply
working-directory: infra
run: terraform apply -auto-approve -input=false

The environment: production line on the apply job is critical. It enables GitHub's environment protection rules, which can require manual approval before apply runs.

GitLab CI Pipeline

GitLab CI has a similar structure but uses stages and manual gates:

# .gitlab-ci.yml
stages:
- validate
- plan
- apply

variables:
TF_ROOT: infra
TF_STATE_NAME: production

.terraform_base:
image: hashicorp/terraform:1.7
before_script:
- cd $TF_ROOT
- terraform init -input=false

validate:
extends: .terraform_base
stage: validate
script:
- terraform validate
- terraform fmt -check

plan:
extends: .terraform_base
stage: plan
script:
- terraform plan -out=tfplan
artifacts:
paths:
- $TF_ROOT/tfplan
expire_in: 7 days
rules:
- if: $CI_MERGE_REQUEST_IID

apply:
extends: .terraform_base
stage: apply
script:
- terraform apply -auto-approve tfplan
dependencies:
- plan
rules:
- if: $CI_COMMIT_BRANCH == "main"
when: manual

The when: manual directive makes the apply stage require a button click in GitLab's UI. Combined with the artifact from the plan stage, you apply exactly the plan that was reviewed — not a fresh plan that might differ.

Atlantis — PR-Driven Terraform

Atlantis is a self-hosted application that automates Terraform via pull request comments. Engineers type atlantis plan and atlantis apply directly in PR comments, and Atlantis executes the commands and posts the results.

# atlantis.yaml (repo-level config)
version: 3
projects:
- name: networking
dir: infra/networking
workspace: default
autoplan:
when_modified: ["*.tf", "*.tfvars"]
enabled: true
apply_requirements: [approved, mergeable]

- name: compute
dir: infra/compute
workspace: default
autoplan:
when_modified: ["*.tf", "*.tfvars"]
enabled: true
apply_requirements: [approved, mergeable]

The apply_requirements field is the governance layer. With [approved, mergeable], Atlantis refuses to run apply until the PR has at least one approval and passes all status checks.

TF_VAR_ Environment Variables in CI

Never hardcode secrets or environment-specific values in your Terraform code. Use the TF_VAR_ prefix to pass variables through the environment:

# CI environment variables
TF_VAR_db_password=${{ secrets.DB_PASSWORD }}
TF_VAR_environment=production
TF_VAR_region=us-east-1
TF_VAR_instance_type=t3.medium
# variables.tf — Terraform picks these up automatically
variable "db_password" {
type = string
sensitive = true
}

variable "environment" {
type = string
}

variable "region" {
type = string
default = "us-east-1"
}

Terraform automatically maps TF_VAR_db_password to var.db_password. No extra configuration needed.

State Locking in CI

State locking prevents two pipeline runs from modifying state simultaneously. When using S3 + DynamoDB, locking is automatic:

# backend.tf
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "production/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-state-locks"
encrypt = true
}
}

If a pipeline run crashes mid-apply, the lock may get stuck. Add a timeout and force-unlock step:

- name: Terraform Init with Lock Timeout
run: terraform init -input=false

- name: Terraform Apply
run: terraform apply -auto-approve -lock-timeout=5m
timeout-minutes: 30

The -lock-timeout=5m tells Terraform to wait up to 5 minutes for an existing lock to release before failing.

Drift Detection with Scheduled Runs

Drift happens. Someone changes a security group in the console, auto-scaling adjusts instance counts, or a different tool modifies a resource Terraform manages. Scheduled plan runs catch this:

# .github/workflows/drift-detection.yml
name: Drift Detection

on:
schedule:
- cron: '0 8 * * 1-5' # 8 AM UTC, weekdays
workflow_dispatch: {}

jobs:
detect-drift:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: hashicorp/setup-terraform@v3

- name: Terraform Init
working-directory: infra
run: terraform init -input=false

- name: Detect Drift
id: drift
working-directory: infra
run: |
terraform plan -detailed-exitcode -out=drift.tfplan 2>&1 | tee plan_output.txt
echo "exitcode=$?" >> $GITHUB_OUTPUT
continue-on-error: true

- name: Alert on Drift
if: steps.drift.outputs.exitcode == '2'
run: |
curl -X POST "${{ secrets.SLACK_WEBHOOK }}" \
-H 'Content-Type: application/json' \
-d '{"text": "Terraform drift detected in production! Check the workflow run for details."}'

The -detailed-exitcode flag is the key. Exit code 0 means no changes, 1 means error, and 2 means drift detected. You can route exit code 2 to Slack, PagerDuty, or a GitHub issue.

Branch Protection Rules

Your pipeline is only as strong as your branch protection. Configure these rules on main:

RulePurpose
Require pull request reviewsNo direct pushes — every change gets reviewed
Require status checks to passPlan must succeed before merge
Require branches to be up to datePlan runs against the latest main
Restrict who can pushOnly the CI bot can push to main
Require linear historyNo merge commits — cleaner git log

Without branch protection, someone can push directly to main and trigger an apply that nobody reviewed. This defeats the entire purpose of the pipeline.

Storing Plan Artifacts

A subtle but dangerous problem: the plan you review on the PR is not the same plan that runs on apply. Between the PR approval and the merge, someone else might merge a different PR that changes the state. The apply job runs a fresh plan against the new state.

To guarantee you apply exactly what was reviewed, store the plan file as an artifact and apply it directly:

- name: Download Plan
uses: actions/download-artifact@v4
with:
name: tfplan-${{ github.event.pull_request.number }}
path: infra

- name: Apply Saved Plan
working-directory: infra
run: terraform apply tfplan

This only works when plan and apply run close together. If the state changes between plan and apply, applying the saved plan will fail — which is exactly what you want. It forces you to re-plan.

Closing Notes

A disciplined Terraform pipeline is as important as the infrastructure code itself. Plan on PR, apply on merge, detect drift on a schedule, and lock everything down with branch protection. In the next post, we will explore Terraform testing — how to validate your configurations, run integration tests with Terratest, and use the built-in terraform test framework to catch bugs before they reach production.