Skip to main content

AWS Cost Optimization — We Cut Our Bill by 60%, Here's How

· 7 min read
Goel Academy
DevOps & Cloud Learning Hub

A team I worked with was paying $14,000/month on AWS. After three weeks of analysis and changes, we brought it down to $5,600. No services were removed, no performance was sacrificed. The problem was never that AWS is expensive — it's that the defaults are expensive, and nobody was watching.

Start With Visibility — Cost Explorer and Budgets

You can't optimize what you can't see. AWS Cost Explorer is free and already enabled in your account. The first step is understanding where the money goes:

# Enable Cost Explorer (if not already active)
aws ce get-cost-and-usage \
--time-period Start=2025-07-01,End=2025-08-01 \
--granularity MONTHLY \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=SERVICE

# Output shows cost per service:
# Amazon EC2: $4,200
# Amazon RDS: $3,100
# NAT Gateway: $2,800 <-- surprise!
# Amazon S3: $1,400
# Data Transfer: $1,200
# Other: $1,300

Set up budgets to get alerts before things spiral:

# Create a monthly budget with alert at 80%
aws budgets create-budget \
--account-id 123456789012 \
--budget '{
"BudgetName": "monthly-infra",
"BudgetLimit": {"Amount": "10000", "Unit": "USD"},
"TimeUnit": "MONTHLY",
"BudgetType": "COST"
}' \
--notifications-with-subscribers '[{
"Notification": {
"NotificationType": "ACTUAL",
"ComparisonOperator": "GREATER_THAN",
"Threshold": 80,
"ThresholdType": "PERCENTAGE"
},
"Subscribers": [{
"SubscriptionType": "EMAIL",
"Address": "team@company.com"
}]
}]'

Reserved Instances vs Savings Plans

If you're running EC2 instances 24/7 and paying on-demand, you're overpaying by 40-72%. Here's the decision:

OptionDiscountFlexibilityCommitmentBest For
On-Demand0%FullNoneUnpredictable workloads
Reserved InstancesUp to 72%Instance type/region locked1 or 3 yearsStable, predictable workloads
Savings Plans (Compute)Up to 66%Any instance family/region/OS1 or 3 yearsFlexible steady-state usage
Savings Plans (EC2)Up to 72%Locked to instance family/region1 or 3 yearsKnown instance family
Spot InstancesUp to 90%Can be interruptedNoneFault-tolerant batch jobs

Savings Plans are almost always the better choice over RIs now because they apply across instance families and even Fargate/Lambda:

# Check your Savings Plans recommendations
aws ce get-savings-plans-purchase-recommendation \
--savings-plans-type COMPUTE_SP \
--term-in-years ONE_YEAR \
--payment-option NO_UPFRONT \
--lookback-period-in-days SIXTY_DAYS

# See current utilization of existing plans
aws ce get-savings-plans-utilization \
--time-period Start=2025-07-01,End=2025-08-01

Spot Instances for Batch Workloads

Spot Instances give you 60-90% discounts, but AWS can reclaim them with 2 minutes notice. They're perfect for batch processing, CI/CD runners, data analysis, and any workload that can handle interruptions:

# Launch a spot fleet for batch processing
aws ec2 request-spot-fleet \
--spot-fleet-request-config '{
"IamFleetRole": "arn:aws:iam::role/spot-fleet-role",
"TargetCapacity": 10,
"SpotPrice": "0.05",
"LaunchSpecifications": [{
"ImageId": "ami-0abcdef1234567890",
"InstanceType": "c5.xlarge",
"SubnetId": "subnet-abc123"
}, {
"ImageId": "ami-0abcdef1234567890",
"InstanceType": "c5a.xlarge",
"SubnetId": "subnet-abc123"
}, {
"ImageId": "ami-0abcdef1234567890",
"InstanceType": "c6i.xlarge",
"SubnetId": "subnet-abc123"
}],
"AllocationStrategy": "lowestPrice",
"Type": "request"
}'

Pro tip: Specify multiple instance types. Spot availability varies by type, and diversifying reduces interruption rates dramatically.

Right-Sizing With Compute Optimizer

AWS Compute Optimizer analyzes your actual CPU, memory, and network usage over the past 14 days and recommends cheaper instance types:

# Enable Compute Optimizer
aws compute-optimizer update-enrollment-status \
--status Active

# Get EC2 recommendations
aws compute-optimizer get-ec2-instance-recommendations \
--query 'instanceRecommendations[*].{
Instance: instanceArn,
Current: currentInstanceType,
Recommended: recommendationOptions[0].instanceType,
MonthlySavings: recommendationOptions[0].projectedUtilizationMetrics
}' --output table

# Common finding: m5.2xlarge running at 15% CPU
# Recommendation: m5.large (saves ~$150/month per instance)

We found 6 instances running m5.2xlarge with average CPU under 20%. Switching to m5.large saved $900/month with zero performance impact.

S3 Lifecycle Policies

Most teams throw data into S3 Standard and forget about it. A simple lifecycle policy can cut storage costs by 60-80%:

# Apply lifecycle policy to transition and expire objects
aws s3api put-bucket-lifecycle-configuration \
--bucket my-app-logs \
--lifecycle-configuration '{
"Rules": [{
"ID": "log-lifecycle",
"Status": "Enabled",
"Filter": {"Prefix": ""},
"Transitions": [
{"Days": 30, "StorageClass": "STANDARD_IA"},
{"Days": 90, "StorageClass": "GLACIER"},
{"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
],
"Expiration": {"Days": 730},
"NoncurrentVersionExpiration": {"NoncurrentDays": 30}
}]
}'
Storage ClassCost/GB/monthRetrieval CostBest For
Standard$0.023NoneFrequently accessed
Standard-IA$0.0125$0.01/GBMonthly access
Glacier Instant$0.004$0.03/GBQuarterly access
Glacier Flexible$0.0036$0.01/GB (3-5 hrs)Annual access
Deep Archive$0.00099$0.02/GB (12 hrs)Compliance/archival

Kill Unused Resources

This is where the easiest money hides. Every AWS account accumulates waste:

# Find unattached Elastic IPs (each costs $0.005/hr = $3.60/month)
aws ec2 describe-addresses \
--query 'Addresses[?AssociationId==null].{IP: PublicIp, AllocationId: AllocationId}' \
--output table

# Release them
aws ec2 release-address --allocation-id eipalloc-abc123

# Find old EBS snapshots (older than 90 days)
aws ec2 describe-snapshots --owner-ids self \
--query 'Snapshots[?StartTime<`2025-05-01`].{ID: SnapshotId, Size: VolumeSize, Created: StartTime}' \
--output table

# Find unattached EBS volumes
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[*].{ID: VolumeId, Size: Size, Type: VolumeType}' \
--output table

# Find idle load balancers (zero healthy targets)
aws elbv2 describe-target-health \
--target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-tg/abc123

In one audit we found: 12 unattached EIPs ($43/month), 340 GB of orphaned EBS volumes ($34/month), 2.1 TB of old snapshots ($105/month), and 2 idle ALBs ($32/month). Small numbers individually, but $214/month adds up to $2,568/year.

The NAT Gateway Trap

NAT Gateway was our biggest surprise. It charges $0.045/hr ($32/month) plus $0.045/GB of data processed. If your private subnets are pulling container images, downloading packages, or syncing with S3, you're paying for every byte through NAT:

# Check NAT Gateway data processing (this is the killer)
aws cloudwatch get-metric-statistics \
--namespace AWS/NATGateway \
--metric-name BytesOutToDestination \
--dimensions Name=NatGatewayId,Value=nat-abc123 \
--start-time 2025-07-01T00:00:00Z \
--end-time 2025-08-01T00:00:00Z \
--period 2592000 \
--statistics Sum

Fix it: Use VPC Endpoints for S3 and DynamoDB (free gateway endpoints), ECR, CloudWatch, and other AWS services (interface endpoints at $0.01/hr, still cheaper than NAT for high traffic).

Cost Allocation Tags

Tags let you see exactly which team, project, or environment is spending what:

# Tag resources consistently
aws ec2 create-tags --resources i-abc123 \
--tags Key=Environment,Value=production \
Key=Team,Value=backend \
Key=Project,Value=api-service \
Key=CostCenter,Value=engineering

# Activate cost allocation tags in billing console
aws ce get-cost-and-usage \
--time-period Start=2025-07-01,End=2025-08-01 \
--granularity MONTHLY \
--metrics BlendedCost \
--group-by Type=TAG,Key=Team

Enforce tagging with an SCP or AWS Config rule so new resources can't be created without required tags.

AWS Trusted Advisor — Free Recommendations

Trusted Advisor scans your account for cost optimization opportunities. The cost optimization checks are available on every support plan:

# List all Trusted Advisor cost optimization checks
aws support describe-trusted-advisor-checks \
--language en \
--query 'checks[?category==`cost_optimizing`].{Name: name, ID: id}' \
--output table

# Run a specific check (e.g., Low Utilization EC2 Instances)
aws support refresh-trusted-advisor-check \
--check-id Qch7DwouX1

Trusted Advisor commonly flags: low-utilization EC2 instances, idle RDS instances, underutilized EBS volumes, unassociated Elastic IPs, and idle load balancers.

The Real Results

Here's what the $14,000-to-$5,600 breakdown looked like:

ChangeMonthly Savings
Compute Savings Plan (1yr, no upfront)$2,100
Right-sized 6 EC2 instances$900
S3 lifecycle policies on 4 TB of logs$1,400
Replaced NAT Gateway with VPC Endpoints$2,200
Deleted unused resources$214
Moved CI/CD runners to Spot$580
Switched dev RDS to single-AZ + smaller class$1,006
Total$8,400

The key insight: most of this wasn't clever engineering. It was just looking at the bill, understanding what each line item means, and making obvious changes. The hardest part was getting started.

What's Next

Cost optimization is an ongoing process, not a one-time project. In the next post, we'll explore Transit Gateway and multi-account networking — which also has significant cost implications when you're connecting multiple VPCs and accounts across your organization.