AWS Cost Optimization — We Cut Our Bill by 60%, Here's How
A team I worked with was paying $14,000/month on AWS. After three weeks of analysis and changes, we brought it down to $5,600. No services were removed, no performance was sacrificed. The problem was never that AWS is expensive — it's that the defaults are expensive, and nobody was watching.
Start With Visibility — Cost Explorer and Budgets
You can't optimize what you can't see. AWS Cost Explorer is free and already enabled in your account. The first step is understanding where the money goes:
# Enable Cost Explorer (if not already active)
aws ce get-cost-and-usage \
--time-period Start=2025-07-01,End=2025-08-01 \
--granularity MONTHLY \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=SERVICE
# Output shows cost per service:
# Amazon EC2: $4,200
# Amazon RDS: $3,100
# NAT Gateway: $2,800 <-- surprise!
# Amazon S3: $1,400
# Data Transfer: $1,200
# Other: $1,300
Set up budgets to get alerts before things spiral:
# Create a monthly budget with alert at 80%
aws budgets create-budget \
--account-id 123456789012 \
--budget '{
"BudgetName": "monthly-infra",
"BudgetLimit": {"Amount": "10000", "Unit": "USD"},
"TimeUnit": "MONTHLY",
"BudgetType": "COST"
}' \
--notifications-with-subscribers '[{
"Notification": {
"NotificationType": "ACTUAL",
"ComparisonOperator": "GREATER_THAN",
"Threshold": 80,
"ThresholdType": "PERCENTAGE"
},
"Subscribers": [{
"SubscriptionType": "EMAIL",
"Address": "team@company.com"
}]
}]'
Reserved Instances vs Savings Plans
If you're running EC2 instances 24/7 and paying on-demand, you're overpaying by 40-72%. Here's the decision:
| Option | Discount | Flexibility | Commitment | Best For |
|---|---|---|---|---|
| On-Demand | 0% | Full | None | Unpredictable workloads |
| Reserved Instances | Up to 72% | Instance type/region locked | 1 or 3 years | Stable, predictable workloads |
| Savings Plans (Compute) | Up to 66% | Any instance family/region/OS | 1 or 3 years | Flexible steady-state usage |
| Savings Plans (EC2) | Up to 72% | Locked to instance family/region | 1 or 3 years | Known instance family |
| Spot Instances | Up to 90% | Can be interrupted | None | Fault-tolerant batch jobs |
Savings Plans are almost always the better choice over RIs now because they apply across instance families and even Fargate/Lambda:
# Check your Savings Plans recommendations
aws ce get-savings-plans-purchase-recommendation \
--savings-plans-type COMPUTE_SP \
--term-in-years ONE_YEAR \
--payment-option NO_UPFRONT \
--lookback-period-in-days SIXTY_DAYS
# See current utilization of existing plans
aws ce get-savings-plans-utilization \
--time-period Start=2025-07-01,End=2025-08-01
Spot Instances for Batch Workloads
Spot Instances give you 60-90% discounts, but AWS can reclaim them with 2 minutes notice. They're perfect for batch processing, CI/CD runners, data analysis, and any workload that can handle interruptions:
# Launch a spot fleet for batch processing
aws ec2 request-spot-fleet \
--spot-fleet-request-config '{
"IamFleetRole": "arn:aws:iam::role/spot-fleet-role",
"TargetCapacity": 10,
"SpotPrice": "0.05",
"LaunchSpecifications": [{
"ImageId": "ami-0abcdef1234567890",
"InstanceType": "c5.xlarge",
"SubnetId": "subnet-abc123"
}, {
"ImageId": "ami-0abcdef1234567890",
"InstanceType": "c5a.xlarge",
"SubnetId": "subnet-abc123"
}, {
"ImageId": "ami-0abcdef1234567890",
"InstanceType": "c6i.xlarge",
"SubnetId": "subnet-abc123"
}],
"AllocationStrategy": "lowestPrice",
"Type": "request"
}'
Pro tip: Specify multiple instance types. Spot availability varies by type, and diversifying reduces interruption rates dramatically.
Right-Sizing With Compute Optimizer
AWS Compute Optimizer analyzes your actual CPU, memory, and network usage over the past 14 days and recommends cheaper instance types:
# Enable Compute Optimizer
aws compute-optimizer update-enrollment-status \
--status Active
# Get EC2 recommendations
aws compute-optimizer get-ec2-instance-recommendations \
--query 'instanceRecommendations[*].{
Instance: instanceArn,
Current: currentInstanceType,
Recommended: recommendationOptions[0].instanceType,
MonthlySavings: recommendationOptions[0].projectedUtilizationMetrics
}' --output table
# Common finding: m5.2xlarge running at 15% CPU
# Recommendation: m5.large (saves ~$150/month per instance)
We found 6 instances running m5.2xlarge with average CPU under 20%. Switching to m5.large saved $900/month with zero performance impact.
S3 Lifecycle Policies
Most teams throw data into S3 Standard and forget about it. A simple lifecycle policy can cut storage costs by 60-80%:
# Apply lifecycle policy to transition and expire objects
aws s3api put-bucket-lifecycle-configuration \
--bucket my-app-logs \
--lifecycle-configuration '{
"Rules": [{
"ID": "log-lifecycle",
"Status": "Enabled",
"Filter": {"Prefix": ""},
"Transitions": [
{"Days": 30, "StorageClass": "STANDARD_IA"},
{"Days": 90, "StorageClass": "GLACIER"},
{"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
],
"Expiration": {"Days": 730},
"NoncurrentVersionExpiration": {"NoncurrentDays": 30}
}]
}'
| Storage Class | Cost/GB/month | Retrieval Cost | Best For |
|---|---|---|---|
| Standard | $0.023 | None | Frequently accessed |
| Standard-IA | $0.0125 | $0.01/GB | Monthly access |
| Glacier Instant | $0.004 | $0.03/GB | Quarterly access |
| Glacier Flexible | $0.0036 | $0.01/GB (3-5 hrs) | Annual access |
| Deep Archive | $0.00099 | $0.02/GB (12 hrs) | Compliance/archival |
Kill Unused Resources
This is where the easiest money hides. Every AWS account accumulates waste:
# Find unattached Elastic IPs (each costs $0.005/hr = $3.60/month)
aws ec2 describe-addresses \
--query 'Addresses[?AssociationId==null].{IP: PublicIp, AllocationId: AllocationId}' \
--output table
# Release them
aws ec2 release-address --allocation-id eipalloc-abc123
# Find old EBS snapshots (older than 90 days)
aws ec2 describe-snapshots --owner-ids self \
--query 'Snapshots[?StartTime<`2025-05-01`].{ID: SnapshotId, Size: VolumeSize, Created: StartTime}' \
--output table
# Find unattached EBS volumes
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[*].{ID: VolumeId, Size: Size, Type: VolumeType}' \
--output table
# Find idle load balancers (zero healthy targets)
aws elbv2 describe-target-health \
--target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-tg/abc123
In one audit we found: 12 unattached EIPs ($43/month), 340 GB of orphaned EBS volumes ($34/month), 2.1 TB of old snapshots ($105/month), and 2 idle ALBs ($32/month). Small numbers individually, but $214/month adds up to $2,568/year.
The NAT Gateway Trap
NAT Gateway was our biggest surprise. It charges $0.045/hr ($32/month) plus $0.045/GB of data processed. If your private subnets are pulling container images, downloading packages, or syncing with S3, you're paying for every byte through NAT:
# Check NAT Gateway data processing (this is the killer)
aws cloudwatch get-metric-statistics \
--namespace AWS/NATGateway \
--metric-name BytesOutToDestination \
--dimensions Name=NatGatewayId,Value=nat-abc123 \
--start-time 2025-07-01T00:00:00Z \
--end-time 2025-08-01T00:00:00Z \
--period 2592000 \
--statistics Sum
Fix it: Use VPC Endpoints for S3 and DynamoDB (free gateway endpoints), ECR, CloudWatch, and other AWS services (interface endpoints at $0.01/hr, still cheaper than NAT for high traffic).
Cost Allocation Tags
Tags let you see exactly which team, project, or environment is spending what:
# Tag resources consistently
aws ec2 create-tags --resources i-abc123 \
--tags Key=Environment,Value=production \
Key=Team,Value=backend \
Key=Project,Value=api-service \
Key=CostCenter,Value=engineering
# Activate cost allocation tags in billing console
aws ce get-cost-and-usage \
--time-period Start=2025-07-01,End=2025-08-01 \
--granularity MONTHLY \
--metrics BlendedCost \
--group-by Type=TAG,Key=Team
Enforce tagging with an SCP or AWS Config rule so new resources can't be created without required tags.
AWS Trusted Advisor — Free Recommendations
Trusted Advisor scans your account for cost optimization opportunities. The cost optimization checks are available on every support plan:
# List all Trusted Advisor cost optimization checks
aws support describe-trusted-advisor-checks \
--language en \
--query 'checks[?category==`cost_optimizing`].{Name: name, ID: id}' \
--output table
# Run a specific check (e.g., Low Utilization EC2 Instances)
aws support refresh-trusted-advisor-check \
--check-id Qch7DwouX1
Trusted Advisor commonly flags: low-utilization EC2 instances, idle RDS instances, underutilized EBS volumes, unassociated Elastic IPs, and idle load balancers.
The Real Results
Here's what the $14,000-to-$5,600 breakdown looked like:
| Change | Monthly Savings |
|---|---|
| Compute Savings Plan (1yr, no upfront) | $2,100 |
| Right-sized 6 EC2 instances | $900 |
| S3 lifecycle policies on 4 TB of logs | $1,400 |
| Replaced NAT Gateway with VPC Endpoints | $2,200 |
| Deleted unused resources | $214 |
| Moved CI/CD runners to Spot | $580 |
| Switched dev RDS to single-AZ + smaller class | $1,006 |
| Total | $8,400 |
The key insight: most of this wasn't clever engineering. It was just looking at the bill, understanding what each line item means, and making obvious changes. The hardest part was getting started.
What's Next
Cost optimization is an ongoing process, not a one-time project. In the next post, we'll explore Transit Gateway and multi-account networking — which also has significant cost implications when you're connecting multiple VPCs and accounts across your organization.
