AWS Performance at Scale — ElastiCache, CloudFront, and Global Accelerator

January 24, 2026 · 8 min read

DevOps & Cloud Learning Hub

Your application works fine with 100 users. At 10,000 users, pages load slowly. At 100,000 users, the database falls over. Performance at scale isn't about buying bigger instances — it's about putting data closer to users, caching aggressively, and removing bottlenecks at every layer. AWS has purpose-built services for each layer of the stack, and knowing when to use which one is what separates a system that scales gracefully from one that catches fire.

The Caching Layer Cake

Performance optimization follows a predictable pattern — cache at the edge first, then at the application layer, then at the database layer:

Layer	Service	What It Caches	Latency Reduction
Edge (CDN)	CloudFront	Static assets, API responses	200ms → 10ms
Application	ElastiCache	Session data, computed results	50ms → <1ms
Database	DAX / Read Replicas	Query results	5ms → <1ms
Network	Global Accelerator	TCP/UDP routing	Variable (30-50%)

The goal: users should hit your origin servers as rarely as possible.

CloudFront — Edge Caching Done Right

CloudFront has 450+ edge locations worldwide. When a user in Tokyo requests your page, it's served from a nearby edge location instead of crossing the Pacific to your us-east-1 origin.

# Create a CloudFront distribution for an S3 static site
aws cloudfront create-distribution \
  --distribution-config '{
    "CallerReference": "my-site-2024",
    "Origins": {
      "Quantity": 1,
      "Items": [{
        "Id": "S3Origin",
        "DomainName": "my-bucket.s3.amazonaws.com",
        "S3OriginConfig": {
          "OriginAccessIdentity": "origin-access-identity/cloudfront/E1234"
        }
      }]
    },
    "DefaultCacheBehavior": {
      "TargetOriginId": "S3Origin",
      "ViewerProtocolPolicy": "redirect-to-https",
      "CachePolicyId": "658327ea-f89d-4fab-a63d-7e88639e58f6",
      "Compress": true
    },
    "Enabled": true,
    "DefaultRootObject": "index.html"
  }'

For dynamic APIs, use cache behaviors with shorter TTLs:

# Cache policy for API responses — 60-second TTL with query string forwarding
aws cloudfront create-cache-policy \
  --cache-policy-config '{
    "Name": "API-CachePolicy",
    "DefaultTTL": 60,
    "MaxTTL": 300,
    "MinTTL": 0,
    "ParametersInCacheKeyAndForwardedToOrigin": {
      "EnableAcceptEncodingGzip": true,
      "HeadersConfig": {
        "HeaderBehavior": "whitelist",
        "Headers": {"Quantity": 1, "Items": ["Authorization"]}
      },
      "QueryStringsConfig": {
        "QueryStringBehavior": "all"
      },
      "CookiesConfig": {
        "CookieBehavior": "none"
      }
    }
  }'

Lambda@Edge — Compute at the Edge

Lambda@Edge lets you run code at CloudFront edge locations. Use it for A/B testing, header manipulation, URL rewrites, or authentication:

// Lambda@Edge: Add security headers to every response
exports.handler = async (event) => {
  const response = event.Records[0].cf.response;
  const headers = response.headers;

  headers['strict-transport-security'] = [{
    key: 'Strict-Transport-Security',
    value: 'max-age=63072000; includeSubDomains; preload'
  }];
  headers['x-content-type-options'] = [{
    key: 'X-Content-Type-Options',
    value: 'nosniff'
  }];
  headers['x-frame-options'] = [{
    key: 'X-Frame-Options',
    value: 'DENY'
  }];

  return response;
};

ElastiCache — Redis vs Memcached

ElastiCache puts a sub-millisecond cache between your application and your database. The first decision: Redis or Memcached?

Feature	Redis	Memcached
Data Structures	Strings, lists, sets, hashes, sorted sets, streams	Strings only
Persistence	Yes (AOF, RDB snapshots)	No
Replication	Yes (read replicas, Multi-AZ)	No
Clustering	Yes (up to 500 nodes)	Yes (up to 40 nodes)
Pub/Sub	Yes	No
Lua Scripting	Yes	No
Multithreaded	Single-threaded (6.x+ has I/O threads)	Multithreaded
Max Item Size	512 MB	1 MB (default), 128 MB (max)
Use Case	Caching + data store + messaging	Pure caching only
Backup/Restore	Yes	No

Use Redis in almost every case. The only reason to choose Memcached is if you need multithreaded performance for simple key-value caching and don't need any advanced features.

# Create an ElastiCache Redis cluster (cluster mode enabled)
aws elasticache create-replication-group \
  --replication-group-id my-redis-cluster \
  --replication-group-description "Production cache" \
  --engine redis \
  --engine-version 7.1 \
  --cache-node-type cache.r7g.large \
  --num-node-groups 3 \
  --replicas-per-node-group 2 \
  --automatic-failover-enabled \
  --multi-az-enabled \
  --at-rest-encryption-enabled \
  --transit-encryption-enabled \
  --cache-subnet-group-name my-subnet-group \
  --security-group-ids sg-0abc123def456

Common caching patterns in application code:

import redis
import json

r = redis.Redis(host='my-redis-cluster.abc123.ng.0001.use1.cache.amazonaws.com',
                port=6379, ssl=True)

def get_user_profile(user_id):
    # Cache-aside pattern
    cache_key = f"user:{user_id}:profile"
    cached = r.get(cache_key)

    if cached:
        return json.loads(cached)  # Cache HIT — sub-millisecond

    # Cache MISS — query database
    profile = db.query("SELECT * FROM users WHERE id = %s", user_id)
    r.setex(cache_key, 3600, json.dumps(profile))  # Cache for 1 hour
    return profile

def invalidate_user_cache(user_id):
    # Invalidate on write
    r.delete(f"user:{user_id}:profile")

Global Accelerator — Network-Level Performance

Global Accelerator gives you two static anycast IP addresses that route traffic to the nearest AWS edge location, then uses the AWS backbone network (instead of the public internet) to reach your application:

# Create a Global Accelerator
aws globalaccelerator create-accelerator \
  --name my-app-accelerator \
  --ip-address-type IPV4 \
  --enabled

# Add a listener
aws globalaccelerator create-listener \
  --accelerator-arn arn:aws:globalaccelerator::123456789012:accelerator/abc-123 \
  --port-ranges FromPort=443,ToPort=443 \
  --protocol TCP

# Add endpoint groups (your ALBs in different regions)
aws globalaccelerator create-endpoint-group \
  --listener-arn arn:aws:globalaccelerator::123456789012:accelerator/abc-123/listener/def-456 \
  --endpoint-group-region us-east-1 \
  --endpoint-configurations \
    EndpointId=arn:aws:elasticloadbalancing:us-east-1:123456789012:loadbalancer/app/my-alb/abc123,Weight=100

CloudFront vs Global Accelerator — they solve different problems:

CloudFront: Caches content at the edge. Best for HTTP/HTTPS, static + dynamic content.
Global Accelerator: Routes traffic over AWS backbone. Best for TCP/UDP, real-time apps, gaming, IoT, non-HTTP protocols.

Compute and Storage Performance

Enhanced Networking and Placement Groups

# Verify Enhanced Networking (ENA) is enabled
aws ec2 describe-instances \
  --instance-ids i-0abc123 \
  --query 'Reservations[].Instances[].EnaSupport'

# Placement groups for low-latency communication
aws ec2 create-placement-group \
  --group-name high-perf-cluster \
  --strategy cluster    # All instances in same rack — lowest latency
  # Other options: spread (max availability), partition (big data)

Placement Strategy	Use Case	Max Instances
Cluster	HPC, low-latency apps	Limited by AZ
Spread	Critical instances, max fault isolation	7 per AZ
Partition	Hadoop, Cassandra, Kafka	7 partitions per AZ

EBS Optimization

# io2 Block Express — highest performance EBS volume
aws ec2 create-volume \
  --volume-type io2 \
  --size 500 \
  --iops 64000 \
  --throughput 1000 \
  --availability-zone us-east-1a \
  --encrypted

# Check current EBS performance metrics
aws cloudwatch get-metric-statistics \
  --namespace AWS/EBS \
  --metric-name VolumeReadOps \
  --dimensions Name=VolumeId,Value=vol-0abc123 \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-01-02T00:00:00Z \
  --period 3600 \
  --statistics Average

Volume Type	Max IOPS	Max Throughput	Use Case
gp3	16,000	1,000 MB/s	General purpose (most workloads)
io2 Block Express	256,000	4,000 MB/s	Databases, latency-sensitive
st1	500	500 MB/s	Big data, log processing
sc1	250	250 MB/s	Cold storage, infrequent access

Database Performance

RDS Performance Insights

Performance Insights is like an X-ray for your database — it shows you exactly which queries are consuming resources:

# Enable Performance Insights on an RDS instance
aws rds modify-db-instance \
  --db-instance-identifier my-database \
  --enable-performance-insights \
  --performance-insights-retention-period 731 \
  --performance-insights-kms-key-id alias/aws/rds

# Query top SQL by load
aws pi get-resource-metrics \
  --service-type RDS \
  --identifier db-ABC123DEF456 \
  --metric-queries '[{"Metric": "db.load.avg", "GroupBy": {"Group": "db.sql", "Limit": 10}}]' \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-01-01T01:00:00Z \
  --period-in-seconds 3600

DynamoDB DAX — Microsecond Reads

DynamoDB Accelerator (DAX) is an in-memory cache that sits in front of DynamoDB. It's API-compatible — swap your DynamoDB client for a DAX client and reads go from single-digit milliseconds to microseconds:

# Without DAX — standard DynamoDB client
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('UserSessions')
response = table.get_item(Key={'session_id': 'abc123'})  # ~5ms

# With DAX — drop-in replacement
import amazondax
dax = amazondax.AmazonDaxClient(endpoints=['my-dax-cluster.abc123.dax-clusters.us-east-1.amazonaws.com:8111'])
response = dax.get_item(TableName='UserSessions', Key={'session_id': {'S': 'abc123'}})  # ~0.2ms

Performance Testing Approach

Never optimize blindly. Measure, identify bottlenecks, optimize, and measure again:

# 1. Load test with a tool like k6
# install k6: https://k6.io/docs/get-started/installation/
k6 run --vus 100 --duration 5m load-test.js

# 2. Check CloudWatch for bottlenecks
aws cloudwatch get-metric-statistics \
  --namespace AWS/RDS \
  --metric-name CPUUtilization \
  --dimensions Name=DBInstanceIdentifier,Value=my-db \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 60 \
  --statistics Average Maximum

# 3. Check ElastiCache hit rate
aws cloudwatch get-metric-statistics \
  --namespace AWS/ElastiCache \
  --metric-name CacheHitRate \
  --dimensions Name=ReplicationGroupId,Value=my-redis \
  --start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 300 \
  --statistics Average

Performance at scale is a layered problem. Start with CloudFront at the edge to eliminate unnecessary round trips. Add ElastiCache to take pressure off your database. Use Global Accelerator if your users are globally distributed and latency-sensitive. Then tune your compute (placement groups, Enhanced Networking) and storage (right EBS type, IOPS provisioning) for the workloads that remain. The key insight: the fastest request is the one that never reaches your origin server. Cache everything you can, as close to the user as you can.

The Caching Layer Cake​

CloudFront — Edge Caching Done Right​

Lambda@Edge — Compute at the Edge​

ElastiCache — Redis vs Memcached​

Global Accelerator — Network-Level Performance​

Compute and Storage Performance​

Enhanced Networking and Placement Groups​

EBS Optimization​

Database Performance​

RDS Performance Insights​

DynamoDB DAX — Microsecond Reads​

Performance Testing Approach​

Stay Updated