Terraform Drift Detection — When Reality Doesn't Match Your Code
You write perfect Terraform code. You apply it. Your infrastructure matches your configuration exactly. Then, two weeks later, someone logs into the AWS Console and changes a security group rule because "it was faster." Now your code says port 443 only, but reality says ports 443 and 8080. This gap between your Terraform code and actual cloud state is drift, and it is the silent killer of infrastructure as code.
What Is Drift?
Drift occurs when the actual state of a cloud resource diverges from what Terraform expects. Terraform tracks expected state in its state file, and actual state lives in the cloud provider's API. When these two disagree, you have drift.
There are two types:
- State drift — the cloud resource changed, but Terraform's state file still reflects the old values. Running
terraform planreveals the difference. - Code drift — someone updated the Terraform code but never applied it. The state file and cloud agree, but the code describes something different.
Both are dangerous. State drift means your infrastructure is not what you think it is. Code drift means your next apply will make unexpected changes.
Common Causes of Drift
| Cause | Example | Frequency |
|---|---|---|
| Console changes | Engineer edits security group in AWS Console | Very common |
| Other tools | Ansible, CloudFormation, or scripts modify resources | Common |
| Auto-scaling | ASG changes instance count, EKS adds nodes | Expected |
| Provider updates | AWS adds default encryption, changes default settings | Occasional |
| Incident response | On-call engineer changes config to stop an outage | Situational |
| Shared resources | Another team modifies a shared VPC or IAM role | Common |
Detecting Drift with terraform plan
The simplest drift detection tool is terraform plan. When run against up-to-date state, it compares desired state (code) against actual state (cloud API) and reports any differences:
# Standard drift check
terraform plan -detailed-exitcode
# Exit codes:
# 0 = no changes (no drift)
# 1 = error
# 2 = changes detected (drift or pending code changes)
The -detailed-exitcode flag is essential for automation. Exit code 2 tells your script that drift exists without parsing plan output.
# Drift detection script
#!/bin/bash
set -euo pipefail
terraform init -input=false
terraform plan -detailed-exitcode -out=drift.tfplan 2>&1 | tee plan_output.txt
EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
echo "No drift detected"
elif [ $EXIT_CODE -eq 2 ]; then
echo "DRIFT DETECTED — review plan output"
# Count the changes
ADDS=$(grep -c "will be created" plan_output.txt || true)
CHANGES=$(grep -c "will be updated" plan_output.txt || true)
DESTROYS=$(grep -c "will be destroyed" plan_output.txt || true)
echo "Summary: +$ADDS ~$CHANGES -$DESTROYS"
else
echo "ERROR: terraform plan failed"
exit 1
fi
terraform refresh
Before Terraform 1.5, terraform refresh was the way to update the state file with actual cloud values without applying changes:
# Update state to match reality (Terraform < 1.5)
terraform refresh
# Modern equivalent (Terraform >= 1.5)
terraform plan -refresh-only
terraform apply -refresh-only
The -refresh-only flag is safer because it shows you exactly what will change in the state file before you approve it. Raw terraform refresh updates the state silently, which can mask problems.
# See what refresh would change
terraform plan -refresh-only
# Output:
# Note: Objects have changed outside of Terraform
#
# Terraform detected the following changes made outside of
# Terraform since the last "terraform apply":
#
# # aws_security_group.web has been changed
# ~ resource "aws_security_group" "web" {
# id = "sg-0abc123"
# ~ ingress {
# + cidr_blocks = ["0.0.0.0/0"]
# + from_port = 8080
# + to_port = 8080
# + protocol = "tcp"
# }
# }
Scheduled Drift Detection in CI
Automate drift detection with a scheduled CI pipeline that runs daily or on weekdays:
# .github/workflows/drift-detection.yml
name: Drift Detection
on:
schedule:
- cron: '0 9 * * 1-5' # 9 AM UTC weekdays
workflow_dispatch: {}
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
jobs:
detect:
runs-on: ubuntu-latest
strategy:
matrix:
workspace: [networking, compute, database, monitoring]
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
- name: Init
working-directory: environments/production/${{ matrix.workspace }}
run: terraform init -input=false
- name: Detect Drift
id: drift
working-directory: environments/production/${{ matrix.workspace }}
run: |
set +e
terraform plan -detailed-exitcode -no-color > plan.txt 2>&1
echo "exitcode=$?" >> $GITHUB_OUTPUT
continue-on-error: true
- name: Create Issue on Drift
if: steps.drift.outputs.exitcode == '2'
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const plan = fs.readFileSync(
`environments/production/${{ matrix.workspace }}/plan.txt`, 'utf8'
);
const truncated = plan.length > 60000
? plan.substring(0, 60000) + '\n...(truncated)'
: plan;
await github.rest.issues.create({
owner: context.repo.owner,
repo: context.repo.repo,
title: `Drift detected: production/${{ matrix.workspace }}`,
body: `## Drift Report\n\nDrift detected in \`production/${{ matrix.workspace }}\`.\n\n\`\`\`\n${truncated}\n\`\`\``,
labels: ['drift', 'infrastructure']
});
This creates a GitHub issue for each workspace that has drift, with the full plan output attached.
Terraform Cloud Drift Detection
Terraform Cloud has built-in drift detection (available on Plus tier and above). It automatically runs a refresh-only plan on a schedule and alerts you when drift is found:
# Enable drift detection on a workspace
resource "tfe_workspace" "production" {
name = "production-networking"
organization = "my-company"
# Enable health assessments (includes drift detection)
assessments_enabled = true
}
TFC drift detection runs as a "health check" separate from normal runs. When it detects drift, it shows a notification in the workspace UI and can send alerts via webhooks.
Handling Drift — Three Strategies
When you discover drift, you have three options:
Strategy 1: Revert the Drift
Apply your Terraform code to overwrite the manual change and restore the desired state:
# Your code says port 443 only
# Someone added port 8080 in the console
# Apply reverts to port 443 only
terraform apply
This is the correct approach when the manual change was unauthorized or accidental.
Strategy 2: Accept the Drift
Update your Terraform code to match the new reality, then refresh the state:
# Update code to include the new port
resource "aws_security_group_rule" "web_8080" {
type = "ingress"
from_port = 8080
to_port = 8080
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
security_group_id = aws_security_group.web.id
}
# Import if needed, then plan should show no changes
terraform plan # Verify clean
This is correct when the manual change was intentional and should be permanent.
Strategy 3: Import the Drift
If someone created a new resource outside Terraform that should be managed going forward:
# Write the resource block in code first
# Then import the existing resource
terraform import aws_security_group.new_sg sg-0xyz789
# Verify
terraform plan # Should show no changes
Preventing Drift
The best drift is the drift that never happens. These controls reduce manual changes:
AWS Service Control Policies (SCPs):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyConsoleSecurityGroupChanges",
"Effect": "Deny",
"Action": [
"ec2:AuthorizeSecurityGroupIngress",
"ec2:RevokeSecurityGroupIngress",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:RevokeSecurityGroupEgress"
],
"Resource": "*",
"Condition": {
"StringNotLike": {
"aws:PrincipalArn": "arn:aws:iam::*:role/TerraformExecutionRole"
}
}
}
]
}
This SCP allows only the Terraform execution role to modify security groups. Engineers can still view them in the console but cannot make changes.
Azure Policy:
{
"mode": "All",
"policyRule": {
"if": {
"allOf": [
{
"field": "type",
"equals": "Microsoft.Network/networkSecurityGroups"
},
{
"not": {
"field": "tags['ManagedBy']",
"equals": "terraform"
}
}
]
},
"then": {
"effect": "deny"
}
}
}
Read-only console access:
# IAM policy for engineers — read-only console, write via Terraform only
resource "aws_iam_policy" "read_only_console" {
name = "ReadOnlyConsoleAccess"
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"ec2:Describe*",
"s3:Get*",
"s3:List*",
"rds:Describe*",
"logs:Get*",
"logs:Describe*",
"cloudwatch:Get*",
"cloudwatch:Describe*",
"cloudwatch:List*"
]
Resource = "*"
}
]
})
}
Real-World Drift Scenarios
Scenario 1: Auto-scaling changes instance count. Your Terraform code says desired_capacity = 3, but the ASG scaled to 5 during a traffic spike. Solution: use lifecycle { ignore_changes = [desired_capacity] } to tell Terraform to ignore this attribute.
resource "aws_autoscaling_group" "web" {
desired_capacity = 3
min_size = 2
max_size = 10
lifecycle {
ignore_changes = [desired_capacity]
}
}
Scenario 2: Someone enables S3 bucket logging in the console. Your code does not define logging. Terraform plan shows it will remove the logging configuration. Solution: either add logging to your Terraform code (accept the drift) or revert it and explain why logging is not needed for this bucket.
Scenario 3: AWS enables default encryption on new EBS volumes. Your code does not specify encryption, but AWS started encrypting by default. The plan shows drift on every EC2 instance. Solution: explicitly set encrypted = true in your code to match the new default and eliminate the drift.
Closing Notes
Drift is inevitable in any organization that uses both Terraform and a cloud console. The goal is not zero drift — it is fast detection and clear resolution. Schedule daily drift checks in CI, use prevention controls (SCPs, Azure Policy) to block unauthorized changes, and train your team that the console is for reading, not writing. Drift left undetected for weeks becomes drift that is expensive and risky to fix. Detect it early, resolve it fast, and always update your code to reflect the final desired state.
