AWS Performance at Scale — ElastiCache, CloudFront, and Global Accelerator
Your application works fine with 100 users. At 10,000 users, pages load slowly. At 100,000 users, the database falls over. Performance at scale isn't about buying bigger instances — it's about putting data closer to users, caching aggressively, and removing bottlenecks at every layer. AWS has purpose-built services for each layer of the stack, and knowing when to use which one is what separates a system that scales gracefully from one that catches fire.
The Caching Layer Cake
Performance optimization follows a predictable pattern — cache at the edge first, then at the application layer, then at the database layer:
| Layer | Service | What It Caches | Latency Reduction |
|---|---|---|---|
| Edge (CDN) | CloudFront | Static assets, API responses | 200ms → 10ms |
| Application | ElastiCache | Session data, computed results | 50ms → <1ms |
| Database | DAX / Read Replicas | Query results | 5ms → <1ms |
| Network | Global Accelerator | TCP/UDP routing | Variable (30-50%) |
The goal: users should hit your origin servers as rarely as possible.
CloudFront — Edge Caching Done Right
CloudFront has 450+ edge locations worldwide. When a user in Tokyo requests your page, it's served from a nearby edge location instead of crossing the Pacific to your us-east-1 origin.
# Create a CloudFront distribution for an S3 static site
aws cloudfront create-distribution \
--distribution-config '{
"CallerReference": "my-site-2024",
"Origins": {
"Quantity": 1,
"Items": [{
"Id": "S3Origin",
"DomainName": "my-bucket.s3.amazonaws.com",
"S3OriginConfig": {
"OriginAccessIdentity": "origin-access-identity/cloudfront/E1234"
}
}]
},
"DefaultCacheBehavior": {
"TargetOriginId": "S3Origin",
"ViewerProtocolPolicy": "redirect-to-https",
"CachePolicyId": "658327ea-f89d-4fab-a63d-7e88639e58f6",
"Compress": true
},
"Enabled": true,
"DefaultRootObject": "index.html"
}'
For dynamic APIs, use cache behaviors with shorter TTLs:
# Cache policy for API responses — 60-second TTL with query string forwarding
aws cloudfront create-cache-policy \
--cache-policy-config '{
"Name": "API-CachePolicy",
"DefaultTTL": 60,
"MaxTTL": 300,
"MinTTL": 0,
"ParametersInCacheKeyAndForwardedToOrigin": {
"EnableAcceptEncodingGzip": true,
"HeadersConfig": {
"HeaderBehavior": "whitelist",
"Headers": {"Quantity": 1, "Items": ["Authorization"]}
},
"QueryStringsConfig": {
"QueryStringBehavior": "all"
},
"CookiesConfig": {
"CookieBehavior": "none"
}
}
}'
Lambda@Edge — Compute at the Edge
Lambda@Edge lets you run code at CloudFront edge locations. Use it for A/B testing, header manipulation, URL rewrites, or authentication:
// Lambda@Edge: Add security headers to every response
exports.handler = async (event) => {
const response = event.Records[0].cf.response;
const headers = response.headers;
headers['strict-transport-security'] = [{
key: 'Strict-Transport-Security',
value: 'max-age=63072000; includeSubDomains; preload'
}];
headers['x-content-type-options'] = [{
key: 'X-Content-Type-Options',
value: 'nosniff'
}];
headers['x-frame-options'] = [{
key: 'X-Frame-Options',
value: 'DENY'
}];
return response;
};
ElastiCache — Redis vs Memcached
ElastiCache puts a sub-millisecond cache between your application and your database. The first decision: Redis or Memcached?
| Feature | Redis | Memcached |
|---|---|---|
| Data Structures | Strings, lists, sets, hashes, sorted sets, streams | Strings only |
| Persistence | Yes (AOF, RDB snapshots) | No |
| Replication | Yes (read replicas, Multi-AZ) | No |
| Clustering | Yes (up to 500 nodes) | Yes (up to 40 nodes) |
| Pub/Sub | Yes | No |
| Lua Scripting | Yes | No |
| Multithreaded | Single-threaded (6.x+ has I/O threads) | Multithreaded |
| Max Item Size | 512 MB | 1 MB (default), 128 MB (max) |
| Use Case | Caching + data store + messaging | Pure caching only |
| Backup/Restore | Yes | No |
Use Redis in almost every case. The only reason to choose Memcached is if you need multithreaded performance for simple key-value caching and don't need any advanced features.
# Create an ElastiCache Redis cluster (cluster mode enabled)
aws elasticache create-replication-group \
--replication-group-id my-redis-cluster \
--replication-group-description "Production cache" \
--engine redis \
--engine-version 7.1 \
--cache-node-type cache.r7g.large \
--num-node-groups 3 \
--replicas-per-node-group 2 \
--automatic-failover-enabled \
--multi-az-enabled \
--at-rest-encryption-enabled \
--transit-encryption-enabled \
--cache-subnet-group-name my-subnet-group \
--security-group-ids sg-0abc123def456
Common caching patterns in application code:
import redis
import json
r = redis.Redis(host='my-redis-cluster.abc123.ng.0001.use1.cache.amazonaws.com',
port=6379, ssl=True)
def get_user_profile(user_id):
# Cache-aside pattern
cache_key = f"user:{user_id}:profile"
cached = r.get(cache_key)
if cached:
return json.loads(cached) # Cache HIT — sub-millisecond
# Cache MISS — query database
profile = db.query("SELECT * FROM users WHERE id = %s", user_id)
r.setex(cache_key, 3600, json.dumps(profile)) # Cache for 1 hour
return profile
def invalidate_user_cache(user_id):
# Invalidate on write
r.delete(f"user:{user_id}:profile")
Global Accelerator — Network-Level Performance
Global Accelerator gives you two static anycast IP addresses that route traffic to the nearest AWS edge location, then uses the AWS backbone network (instead of the public internet) to reach your application:
# Create a Global Accelerator
aws globalaccelerator create-accelerator \
--name my-app-accelerator \
--ip-address-type IPV4 \
--enabled
# Add a listener
aws globalaccelerator create-listener \
--accelerator-arn arn:aws:globalaccelerator::123456789012:accelerator/abc-123 \
--port-ranges FromPort=443,ToPort=443 \
--protocol TCP
# Add endpoint groups (your ALBs in different regions)
aws globalaccelerator create-endpoint-group \
--listener-arn arn:aws:globalaccelerator::123456789012:accelerator/abc-123/listener/def-456 \
--endpoint-group-region us-east-1 \
--endpoint-configurations \
EndpointId=arn:aws:elasticloadbalancing:us-east-1:123456789012:loadbalancer/app/my-alb/abc123,Weight=100
CloudFront vs Global Accelerator — they solve different problems:
- CloudFront: Caches content at the edge. Best for HTTP/HTTPS, static + dynamic content.
- Global Accelerator: Routes traffic over AWS backbone. Best for TCP/UDP, real-time apps, gaming, IoT, non-HTTP protocols.
Compute and Storage Performance
Enhanced Networking and Placement Groups
# Verify Enhanced Networking (ENA) is enabled
aws ec2 describe-instances \
--instance-ids i-0abc123 \
--query 'Reservations[].Instances[].EnaSupport'
# Placement groups for low-latency communication
aws ec2 create-placement-group \
--group-name high-perf-cluster \
--strategy cluster # All instances in same rack — lowest latency
# Other options: spread (max availability), partition (big data)
| Placement Strategy | Use Case | Max Instances |
|---|---|---|
| Cluster | HPC, low-latency apps | Limited by AZ |
| Spread | Critical instances, max fault isolation | 7 per AZ |
| Partition | Hadoop, Cassandra, Kafka | 7 partitions per AZ |
EBS Optimization
# io2 Block Express — highest performance EBS volume
aws ec2 create-volume \
--volume-type io2 \
--size 500 \
--iops 64000 \
--throughput 1000 \
--availability-zone us-east-1a \
--encrypted
# Check current EBS performance metrics
aws cloudwatch get-metric-statistics \
--namespace AWS/EBS \
--metric-name VolumeReadOps \
--dimensions Name=VolumeId,Value=vol-0abc123 \
--start-time 2024-01-01T00:00:00Z \
--end-time 2024-01-02T00:00:00Z \
--period 3600 \
--statistics Average
| Volume Type | Max IOPS | Max Throughput | Use Case |
|---|---|---|---|
| gp3 | 16,000 | 1,000 MB/s | General purpose (most workloads) |
| io2 Block Express | 256,000 | 4,000 MB/s | Databases, latency-sensitive |
| st1 | 500 | 500 MB/s | Big data, log processing |
| sc1 | 250 | 250 MB/s | Cold storage, infrequent access |
Database Performance
RDS Performance Insights
Performance Insights is like an X-ray for your database — it shows you exactly which queries are consuming resources:
# Enable Performance Insights on an RDS instance
aws rds modify-db-instance \
--db-instance-identifier my-database \
--enable-performance-insights \
--performance-insights-retention-period 731 \
--performance-insights-kms-key-id alias/aws/rds
# Query top SQL by load
aws pi get-resource-metrics \
--service-type RDS \
--identifier db-ABC123DEF456 \
--metric-queries '[{"Metric": "db.load.avg", "GroupBy": {"Group": "db.sql", "Limit": 10}}]' \
--start-time 2024-01-01T00:00:00Z \
--end-time 2024-01-01T01:00:00Z \
--period-in-seconds 3600
DynamoDB DAX — Microsecond Reads
DynamoDB Accelerator (DAX) is an in-memory cache that sits in front of DynamoDB. It's API-compatible — swap your DynamoDB client for a DAX client and reads go from single-digit milliseconds to microseconds:
# Without DAX — standard DynamoDB client
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('UserSessions')
response = table.get_item(Key={'session_id': 'abc123'}) # ~5ms
# With DAX — drop-in replacement
import amazondax
dax = amazondax.AmazonDaxClient(endpoints=['my-dax-cluster.abc123.dax-clusters.us-east-1.amazonaws.com:8111'])
response = dax.get_item(TableName='UserSessions', Key={'session_id': {'S': 'abc123'}}) # ~0.2ms
Performance Testing Approach
Never optimize blindly. Measure, identify bottlenecks, optimize, and measure again:
# 1. Load test with a tool like k6
# install k6: https://k6.io/docs/get-started/installation/
k6 run --vus 100 --duration 5m load-test.js
# 2. Check CloudWatch for bottlenecks
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name CPUUtilization \
--dimensions Name=DBInstanceIdentifier,Value=my-db \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 60 \
--statistics Average Maximum
# 3. Check ElastiCache hit rate
aws cloudwatch get-metric-statistics \
--namespace AWS/ElastiCache \
--metric-name CacheHitRate \
--dimensions Name=ReplicationGroupId,Value=my-redis \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 300 \
--statistics Average
Performance at scale is a layered problem. Start with CloudFront at the edge to eliminate unnecessary round trips. Add ElastiCache to take pressure off your database. Use Global Accelerator if your users are globally distributed and latency-sensitive. Then tune your compute (placement groups, Enhanced Networking) and storage (right EBS type, IOPS provisioning) for the workloads that remain. The key insight: the fastest request is the one that never reaches your origin server. Cache everything you can, as close to the user as you can.
