Terraform Drift Detection — When Reality Doesn't Match Your Code

November 15, 2025 · 7 min read

DevOps & Cloud Learning Hub

You write perfect Terraform code. You apply it. Your infrastructure matches your configuration exactly. Then, two weeks later, someone logs into the AWS Console and changes a security group rule because "it was faster." Now your code says port 443 only, but reality says ports 443 and 8080. This gap between your Terraform code and actual cloud state is drift, and it is the silent killer of infrastructure as code.

What Is Drift?

Drift occurs when the actual state of a cloud resource diverges from what Terraform expects. Terraform tracks expected state in its state file, and actual state lives in the cloud provider's API. When these two disagree, you have drift.

There are two types:

State drift — the cloud resource changed, but Terraform's state file still reflects the old values. Running terraform plan reveals the difference.
Code drift — someone updated the Terraform code but never applied it. The state file and cloud agree, but the code describes something different.

Both are dangerous. State drift means your infrastructure is not what you think it is. Code drift means your next apply will make unexpected changes.

Common Causes of Drift

Cause	Example	Frequency
Console changes	Engineer edits security group in AWS Console	Very common
Other tools	Ansible, CloudFormation, or scripts modify resources	Common
Auto-scaling	ASG changes instance count, EKS adds nodes	Expected
Provider updates	AWS adds default encryption, changes default settings	Occasional
Incident response	On-call engineer changes config to stop an outage	Situational
Shared resources	Another team modifies a shared VPC or IAM role	Common

Detecting Drift with terraform plan

The simplest drift detection tool is terraform plan. When run against up-to-date state, it compares desired state (code) against actual state (cloud API) and reports any differences:

# Standard drift check
terraform plan -detailed-exitcode

# Exit codes:
# 0 = no changes (no drift)
# 1 = error
# 2 = changes detected (drift or pending code changes)

The -detailed-exitcode flag is essential for automation. Exit code 2 tells your script that drift exists without parsing plan output.

# Drift detection script
#!/bin/bash
set -euo pipefail

terraform init -input=false
terraform plan -detailed-exitcode -out=drift.tfplan 2>&1 | tee plan_output.txt

EXIT_CODE=$?

if [ $EXIT_CODE -eq 0 ]; then
  echo "No drift detected"
elif [ $EXIT_CODE -eq 2 ]; then
  echo "DRIFT DETECTED — review plan output"
  # Count the changes
  ADDS=$(grep -c "will be created" plan_output.txt || true)
  CHANGES=$(grep -c "will be updated" plan_output.txt || true)
  DESTROYS=$(grep -c "will be destroyed" plan_output.txt || true)
  echo "Summary: +$ADDS ~$CHANGES -$DESTROYS"
else
  echo "ERROR: terraform plan failed"
  exit 1
fi

terraform refresh

Before Terraform 1.5, terraform refresh was the way to update the state file with actual cloud values without applying changes:

# Update state to match reality (Terraform < 1.5)
terraform refresh

# Modern equivalent (Terraform >= 1.5)
terraform plan -refresh-only
terraform apply -refresh-only

The -refresh-only flag is safer because it shows you exactly what will change in the state file before you approve it. Raw terraform refresh updates the state silently, which can mask problems.

# See what refresh would change
terraform plan -refresh-only

# Output:
# Note: Objects have changed outside of Terraform
#
# Terraform detected the following changes made outside of
# Terraform since the last "terraform apply":
#
#   # aws_security_group.web has been changed
#   ~ resource "aws_security_group" "web" {
#         id   = "sg-0abc123"
#       ~ ingress {
#           + cidr_blocks = ["0.0.0.0/0"]
#           + from_port   = 8080
#           + to_port     = 8080
#           + protocol    = "tcp"
#         }
#     }

Scheduled Drift Detection in CI

Automate drift detection with a scheduled CI pipeline that runs daily or on weekdays:

# .github/workflows/drift-detection.yml
name: Drift Detection

on:
  schedule:
    - cron: '0 9 * * 1-5'  # 9 AM UTC weekdays
  workflow_dispatch: {}

env:
  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

jobs:
  detect:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        workspace: [networking, compute, database, monitoring]
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3

      - name: Init
        working-directory: environments/production/${{ matrix.workspace }}
        run: terraform init -input=false

      - name: Detect Drift
        id: drift
        working-directory: environments/production/${{ matrix.workspace }}
        run: |
          set +e
          terraform plan -detailed-exitcode -no-color > plan.txt 2>&1
          echo "exitcode=$?" >> $GITHUB_OUTPUT
        continue-on-error: true

      - name: Create Issue on Drift
        if: steps.drift.outputs.exitcode == '2'
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const plan = fs.readFileSync(
              `environments/production/${{ matrix.workspace }}/plan.txt`, 'utf8'
            );
            const truncated = plan.length > 60000
              ? plan.substring(0, 60000) + '\n...(truncated)'
              : plan;
            await github.rest.issues.create({
              owner: context.repo.owner,
              repo: context.repo.repo,
              title: `Drift detected: production/${{ matrix.workspace }}`,
              body: `## Drift Report\n\nDrift detected in \`production/${{ matrix.workspace }}\`.\n\n\`\`\`\n${truncated}\n\`\`\``,
              labels: ['drift', 'infrastructure']
            });

This creates a GitHub issue for each workspace that has drift, with the full plan output attached.

Terraform Cloud Drift Detection

Terraform Cloud has built-in drift detection (available on Plus tier and above). It automatically runs a refresh-only plan on a schedule and alerts you when drift is found:

# Enable drift detection on a workspace
resource "tfe_workspace" "production" {
  name         = "production-networking"
  organization = "my-company"

  # Enable health assessments (includes drift detection)
  assessments_enabled = true
}

TFC drift detection runs as a "health check" separate from normal runs. When it detects drift, it shows a notification in the workspace UI and can send alerts via webhooks.

Handling Drift — Three Strategies

When you discover drift, you have three options:

Strategy 1: Revert the Drift

Apply your Terraform code to overwrite the manual change and restore the desired state:

# Your code says port 443 only
# Someone added port 8080 in the console
# Apply reverts to port 443 only
terraform apply

This is the correct approach when the manual change was unauthorized or accidental.

Strategy 2: Accept the Drift

Update your Terraform code to match the new reality, then refresh the state:

# Update code to include the new port
resource "aws_security_group_rule" "web_8080" {
  type              = "ingress"
  from_port         = 8080
  to_port           = 8080
  protocol          = "tcp"
  cidr_blocks       = ["0.0.0.0/0"]
  security_group_id = aws_security_group.web.id
}

# Import if needed, then plan should show no changes
terraform plan  # Verify clean

This is correct when the manual change was intentional and should be permanent.

Strategy 3: Import the Drift

If someone created a new resource outside Terraform that should be managed going forward:

# Write the resource block in code first
# Then import the existing resource
terraform import aws_security_group.new_sg sg-0xyz789

# Verify
terraform plan  # Should show no changes

Preventing Drift

The best drift is the drift that never happens. These controls reduce manual changes:

AWS Service Control Policies (SCPs):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyConsoleSecurityGroupChanges",
      "Effect": "Deny",
      "Action": [
        "ec2:AuthorizeSecurityGroupIngress",
        "ec2:RevokeSecurityGroupIngress",
        "ec2:AuthorizeSecurityGroupEgress",
        "ec2:RevokeSecurityGroupEgress"
      ],
      "Resource": "*",
      "Condition": {
        "StringNotLike": {
          "aws:PrincipalArn": "arn:aws:iam::*:role/TerraformExecutionRole"
        }
      }
    }
  ]
}

This SCP allows only the Terraform execution role to modify security groups. Engineers can still view them in the console but cannot make changes.

Azure Policy:

{
  "mode": "All",
  "policyRule": {
    "if": {
      "allOf": [
        {
          "field": "type",
          "equals": "Microsoft.Network/networkSecurityGroups"
        },
        {
          "not": {
            "field": "tags['ManagedBy']",
            "equals": "terraform"
          }
        }
      ]
    },
    "then": {
      "effect": "deny"
    }
  }
}

Read-only console access:

# IAM policy for engineers — read-only console, write via Terraform only
resource "aws_iam_policy" "read_only_console" {
  name = "ReadOnlyConsoleAccess"

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect   = "Allow"
        Action   = [
          "ec2:Describe*",
          "s3:Get*",
          "s3:List*",
          "rds:Describe*",
          "logs:Get*",
          "logs:Describe*",
          "cloudwatch:Get*",
          "cloudwatch:Describe*",
          "cloudwatch:List*"
        ]
        Resource = "*"
      }
    ]
  })
}

Real-World Drift Scenarios

Scenario 1: Auto-scaling changes instance count. Your Terraform code says desired_capacity = 3, but the ASG scaled to 5 during a traffic spike. Solution: use lifecycle { ignore_changes = [desired_capacity] } to tell Terraform to ignore this attribute.

resource "aws_autoscaling_group" "web" {
  desired_capacity = 3
  min_size         = 2
  max_size         = 10

  lifecycle {
    ignore_changes = [desired_capacity]
  }
}

Scenario 2: Someone enables S3 bucket logging in the console. Your code does not define logging. Terraform plan shows it will remove the logging configuration. Solution: either add logging to your Terraform code (accept the drift) or revert it and explain why logging is not needed for this bucket.

Scenario 3: AWS enables default encryption on new EBS volumes. Your code does not specify encryption, but AWS started encrypting by default. The plan shows drift on every EC2 instance. Solution: explicitly set encrypted = true in your code to match the new default and eliminate the drift.

Closing Notes

Drift is inevitable in any organization that uses both Terraform and a cloud console. The goal is not zero drift — it is fast detection and clear resolution. Schedule daily drift checks in CI, use prevention controls (SCPs, Azure Policy) to block unauthorized changes, and train your team that the console is for reading, not writing. Drift left undetected for weeks becomes drift that is expensive and risky to fix. Detect it early, resolve it fast, and always update your code to reflect the final desired state.

What Is Drift?​

Common Causes of Drift​

Detecting Drift with terraform plan​

terraform refresh​

Scheduled Drift Detection in CI​

Terraform Cloud Drift Detection​

Handling Drift — Three Strategies​

Strategy 1: Revert the Drift​

Strategy 2: Accept the Drift​

Strategy 3: Import the Drift​

Preventing Drift​

Real-World Drift Scenarios​

Closing Notes​

Stay Updated