Terraform Troubleshooting — Debug Logs, Crash Recovery, and Common Errors
Terraform breaks. Not often, but when it does, the error messages can range from crystal clear to deeply cryptic. A state lock that will not release at 2 AM, a dependency cycle that appeared out of nowhere, a crash that leaves your state file in limbo — these are the moments that test your Terraform skills. This post is your troubleshooting playbook: the debug tools, the common errors, and the recovery procedures that get you back on track.
TF_LOG: Your First Debugging Tool
Terraform has built-in logging controlled by environment variables. When an error message is not enough, turn on debug logs.
| Level | What It Shows | When to Use |
|---|---|---|
ERROR | Only errors | Minimal noise, start here |
WARN | Errors + warnings | Deprecation notices, potential issues |
INFO | General operational info | Understanding flow of operations |
DEBUG | Detailed internal operations | Provider communication, state operations |
TRACE | Everything, including HTTP calls | Last resort — very verbose |
# Set log level
export TF_LOG=DEBUG
# Save logs to a file (essential for TRACE — output is massive)
export TF_LOG_PATH="terraform-debug.log"
# Run your command
terraform plan
# When done, disable logging
unset TF_LOG
unset TF_LOG_PATH
You can also enable logging for specific components:
# Only show provider-level logs
export TF_LOG_PROVIDER=TRACE
# Only show core Terraform logs
export TF_LOG_CORE=DEBUG
This is invaluable when the issue is in provider communication (API errors, auth failures) versus Terraform core (state parsing, dependency resolution).
Common Errors and Fixes
1. State Lock Errors
Error: Error acquiring the state lock
Lock Info:
ID: a1b2c3d4-e5f6-7890-abcd-ef1234567890
Path: s3://my-bucket/terraform.tfstate
Operation: OperationTypePlan
Who: engineer@laptop
Created: 2026-01-24 10:30:00.000000 UTC
This means another Terraform process (or a crashed process) holds the lock.
Fix: First, check if someone else is running Terraform. If not, and the process crashed:
# Force unlock with the lock ID from the error message
terraform force-unlock a1b2c3d4-e5f6-7890-abcd-ef1234567890
Only use force-unlock when you are certain no other process is running. Unlocking while another apply is active corrupts state.
2. Provider Authentication Failures
Error: error configuring Terraform AWS Provider: no valid credential sources found
Fix: Verify credentials are available:
# Check AWS credentials
aws sts get-caller-identity
# Check environment variables
echo $AWS_ACCESS_KEY_ID
echo $AWS_PROFILE
# For Azure
az account show
# For GCP
gcloud auth application-default print-access-token
Common causes: expired SSO session, missing environment variable, wrong profile, or IAM role without the required permissions.
3. Dependency Cycles
Error: Cycle: aws_security_group.web, aws_security_group.app
Terraform detected a circular dependency: resource A references resource B, which references resource A.
Fix: Break the cycle by using a separate aws_security_group_rule resource:
# BAD: Circular reference
resource "aws_security_group" "web" {
ingress {
security_groups = [aws_security_group.app.id] # References app
}
}
resource "aws_security_group" "app" {
ingress {
security_groups = [aws_security_group.web.id] # References web → CYCLE
}
}
# GOOD: Break the cycle with separate rules
resource "aws_security_group" "web" {
name = "web-sg"
}
resource "aws_security_group" "app" {
name = "app-sg"
}
resource "aws_security_group_rule" "web_from_app" {
type = "ingress"
security_group_id = aws_security_group.web.id
source_security_group_id = aws_security_group.app.id
from_port = 443
to_port = 443
protocol = "tcp"
}
resource "aws_security_group_rule" "app_from_web" {
type = "ingress"
security_group_id = aws_security_group.app.id
source_security_group_id = aws_security_group.web.id
from_port = 8080
to_port = 8080
protocol = "tcp"
}
4. Resource Already Exists
Error: creating S3 Bucket (my-bucket): BucketAlreadyOwnedByYou
Terraform tried to create a resource that already exists in the cloud but is not in state.
Fix: Import the existing resource:
terraform import aws_s3_bucket.my_bucket my-bucket
Then run terraform plan to verify alignment.
5. Timeout Errors
Error: timeout while waiting for resource to be created
Fix: Increase timeouts in the resource configuration:
resource "aws_db_instance" "main" {
# ... configuration ...
timeouts {
create = "60m" # Default is usually 40m
update = "80m"
delete = "60m"
}
}
Crash Recovery
When Terraform crashes mid-apply, it may leave behind a partial state. Here is the recovery process.
Step 1: Check for crash.log
# Terraform writes a crash log on panic
ls -la crash.log
# Read it for clues
head -100 crash.log
The crash log contains a Go stack trace. Look for the panic message near the top — it usually indicates which resource or operation caused the crash.
Step 2: Check State Integrity
# List resources in state — does this work?
terraform state list
# If state is corrupted, check for backup
ls -la terraform.tfstate.backup
ls -la .terraform/terraform.tfstate.backup
Terraform creates a backup before every state-modifying operation. If your state is corrupted, replace it with the backup:
# Restore from backup
cp terraform.tfstate.backup terraform.tfstate
Step 3: Verify and Reconcile
# Refresh state from cloud reality
terraform refresh
# Check what Terraform thinks needs to happen
terraform plan
If terraform plan shows resources that need to be created but already exist in the cloud, import them. If it shows resources to be destroyed that should exist, the crash may have partially applied — verify in the cloud console.
State Corruption Recovery
For remote state stored in S3, you can recover from versioned buckets:
# List state file versions in S3
aws s3api list-object-versions \
--bucket my-terraform-state \
--prefix prod/terraform.tfstate \
--query 'Versions[*].{VersionId:VersionId,LastModified:LastModified,Size:Size}' \
--output table
# Download a specific version
aws s3api get-object \
--bucket my-terraform-state \
--key prod/terraform.tfstate \
--version-id "abc123..." \
terraform.tfstate.recovered
# Verify it's valid
terraform state list -state=terraform.tfstate.recovered
# Replace the current state (push recovered state to remote backend)
terraform state push terraform.tfstate.recovered
Always enable versioning on your state bucket. It is the single most important safety net for Terraform state.
Slow Plans
When terraform plan takes 5+ minutes, your productivity drops. Common causes and fixes:
# Check how many resources are in state
terraform state list | wc -l
| Cause | Fix |
|---|---|
| Too many resources (500+) | Split state into smaller modules |
| Slow provider API calls | Use -refresh=false when safe (state is recent) |
| Low parallelism | Increase with -parallelism=20 (default is 10) |
| Unnecessary data sources | Remove data sources that query on every plan |
| Provider rate limiting | Reduce parallelism or add retry logic |
# Skip refresh when state is known to be current
terraform plan -refresh=false
# Target a specific resource (plan only what you need)
terraform plan -target=aws_instance.web
# Increase parallelism for large plans
terraform plan -parallelism=30
Use -target sparingly. It is a debugging tool, not a workflow. Frequent use means your state is too big and needs splitting.
"Known After Apply" Issues
# This is normal — not an error
+ endpoint = (known after apply)
This means Terraform cannot determine the value until the resource is created (e.g., an IP address, ARN, or generated ID). However, if you see (known after apply) on an attribute you explicitly set, something is wrong:
# If 'name' shows as (known after apply) despite being set:
resource "aws_s3_bucket" "test" {
bucket = "my-explicit-name"
# If 'bucket' is (known after apply) here, check provider version
}
Fix: Update your provider version. Older provider versions sometimes mark writable attributes as computed.
Common Deprecation Warnings
Terraform providers evolve. Watch for these:
# Warning: Argument is deprecated
# resource "aws_s3_bucket" "test" {
# acl = "private" # Deprecated in AWS provider v4+
# }
# Fix: Use separate resource
resource "aws_s3_bucket_acl" "test" {
bucket = aws_s3_bucket.test.id
acl = "private"
}
# Warning: terraform.workspace is deprecated in Terraform Cloud
# Fix: Use var.environment or TFC workspace variables instead
Do not ignore deprecation warnings. They become errors in the next major provider version.
Provider Version Conflicts
Error: Failed to query available provider packages
Could not retrieve the list of available versions for provider hashicorp/aws:
locked provider registry.terraform.io/hashicorp/aws 4.67.0 does not match
configured version constraint ~> 5.0
Fix: Update the lock file:
# Delete the lock file and re-initialize
rm .terraform.lock.hcl
terraform init -upgrade
# Or update a specific provider
terraform init -upgrade -target=hashicorp/aws
Debugging Checklist
When something goes wrong, work through this list:
# 1. Read the error message carefully — Terraform errors are usually specific
# Look for: resource name, attribute, API error code
# 2. Check your Terraform and provider versions
terraform version
# 3. Enable debug logging
export TF_LOG=DEBUG
export TF_LOG_PATH="debug.log"
# 4. Validate configuration syntax
terraform validate
# 5. Check state health
terraform state list
# 6. Refresh state from cloud
terraform refresh
# 7. Run plan and read every line of output
terraform plan
# 8. Check provider documentation for the specific resource
# https://registry.terraform.io/providers/hashicorp/aws/latest/docs
# 9. Search the provider's GitHub issues
# Many errors are known bugs with workarounds
# 10. Reproduce in a minimal configuration
# Strip everything except the failing resource
Closing Note
Terraform troubleshooting is a skill you build through exposure. The errors in this post cover 90% of what you will encounter in production use. The key habits are: always read the full error message before searching the internet, enable debug logging early, keep your state bucket versioned, and never force-unlock without confirming no other process is running. Terraform is deterministic — when something breaks, there is always a reason and always a path to recovery.
