Linux Performance Tuning — CPU, Memory, and I/O Optimization

August 9, 2025 · 6 min read

DevOps & Cloud Learning Hub

Your server handles 1,000 requests per second — here's how to push it to 10,000. Performance tuning isn't about guessing; it's about measuring, identifying bottlenecks, and making targeted changes to CPU scheduling, memory management, and disk I/O.

Baseline: Know Before You Tune

Before changing anything, capture your baseline. You cannot improve what you haven't measured.

# Capture a 10-second system snapshot
vmstat 1 10

The key columns to watch: r (run queue — processes waiting for CPU), b (blocked on I/O), si/so (swap in/out — should be zero on healthy systems), and wa (I/O wait percentage).

# CPU utilization per core — critical for spotting single-core bottlenecks
mpstat -P ALL 1 5

If one core is at 100% while others are idle, your workload isn't parallelized — no amount of kernel tuning will fix that.

# Memory overview — the real picture, not just 'free'
cat /proc/meminfo | grep -E "MemTotal|MemFree|MemAvailable|Buffers|Cached|SwapTotal|SwapFree|Dirty|Writeback"

MemAvailable is the number you care about — it accounts for reclaimable cache. If MemFree is low but MemAvailable is healthy, your system is fine.

CPU Performance Tuning

CPU Frequency Governors

Modern CPUs dynamically adjust their clock speed. The governor policy determines how aggressively this happens.

# Check current governor for all CPUs
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

# List available governors
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors

# Set performance governor (max frequency, zero latency)
for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
  echo "performance" | sudo tee "$cpu"
done

Governor	Behavior	Use Case
`performance`	Always max frequency	Latency-sensitive workloads, databases
`powersave`	Always min frequency	Idle servers, cost optimization
`ondemand`	Ramps up on load	General purpose (legacy)
`schedutil`	Kernel scheduler-driven	Modern default, good balance

For production databases and real-time systems, performance is non-negotiable. The microseconds saved on frequency ramp-up matter at scale.

CPU Affinity and Isolation

Pin critical processes to specific cores and keep the kernel off them.

# Isolate CPUs 2-7 from the kernel scheduler (add to GRUB_CMDLINE_LINUX)
# isolcpus=2-7 nohz_full=2-7 rcu_nocbs=2-7

# Pin a process to CPUs 2-3
taskset -c 2-3 ./your-application

# Check what CPU a process is running on
taskset -p $(pidof nginx)

Memory Optimization

Swappiness and Dirty Ratios

The kernel's default memory behavior is tuned for desktop workloads. Production servers need different settings.

# Check current values
sysctl vm.swappiness vm.dirty_ratio vm.dirty_background_ratio vm.dirty_expire_centisecs

# Production tuning for database servers
sudo sysctl -w vm.swappiness=10                    # Prefer keeping pages in RAM (default: 60)
sudo sysctl -w vm.dirty_ratio=40                   # Max dirty pages before synchronous flush (default: 20)
sudo sysctl -w vm.dirty_background_ratio=10         # Start background flush at 10% (default: 10)
sudo sysctl -w vm.dirty_expire_centisecs=3000       # Flush dirty pages after 30 seconds (default: 3000)

Parameter	Default	Database Server	Web Server
`vm.swappiness`	60	10	10
`vm.dirty_ratio`	20	40	20
`vm.dirty_background_ratio`	10	10	5
`vm.overcommit_memory`	0	2	0

For Redis specifically, set vm.overcommit_memory=1 — Redis forks for background saves and needs the kernel to allow overcommit.

Transparent Huge Pages (THP)

THP can cause latency spikes in database workloads due to compaction overhead.

# Check THP status
cat /sys/kernel/mm/transparent_hugepage/enabled

# Disable THP for database servers (Redis, MongoDB, PostgreSQL all recommend this)
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag

NUMA Awareness

On multi-socket servers, memory access latency depends on which socket owns the memory. Ignoring NUMA can halve your throughput.

# Check NUMA topology
numactl --hardware

# Run a process on NUMA node 0 (local memory only)
numactl --cpunodebind=0 --membind=0 ./your-application

# Check per-node memory stats
numastat -c

If numastat shows high other_node hits, your processes are accessing remote memory — bind them to the correct NUMA node.

I/O Optimization

I/O Schedulers

The I/O scheduler determines how disk requests are ordered and merged.

# Check current scheduler per device
cat /sys/block/sda/queue/scheduler

# Available schedulers on modern kernels
# [mq-deadline] kyber bfq none

# Set scheduler for NVMe (none is best — the device has its own scheduler)
echo none | sudo tee /sys/block/nvme0n1/queue/scheduler

# Set scheduler for spinning disks
echo mq-deadline | sudo tee /sys/block/sda/queue/scheduler

Scheduler	Best For	Why
`none`	NVMe SSDs	No overhead, device handles scheduling
`mq-deadline`	SATA SSDs, HDDs with deadlines	Prevents starvation, predictable latency
`bfq`	Desktop, interactive workloads	Fair bandwidth distribution
`kyber`	Fast SSDs, low-latency targets	Lightweight, latency-focused

Read-Ahead and Queue Depth

# Check current read-ahead (in 512-byte sectors)
blockdev --getra /dev/sda

# Increase read-ahead for sequential workloads (e.g., Kafka, log processing)
sudo blockdev --setra 4096 /dev/sda    # 2MB read-ahead

# Check and set queue depth for NVMe
cat /sys/block/nvme0n1/queue/nr_requests
echo 1024 | sudo tee /sys/block/nvme0n1/queue/nr_requests

Monitoring I/O in Real Time

# Per-device I/O stats with 1-second intervals
iostat -xz 1 5

Key columns: %util (device saturation — above 80% is a problem), await (average request time in ms), r_await/w_await (read/write latency), and aqu-sz (average queue length).

Putting It All Together: A Production Tuning Script

#!/bin/bash
# production-tune.sh — Apply performance tuning for a web/API server

# CPU: performance governor
for gov in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
  echo "performance" > "$gov" 2>/dev/null
done

# Memory
sysctl -w vm.swappiness=10
sysctl -w vm.dirty_ratio=20
sysctl -w vm.dirty_background_ratio=5
sysctl -w vm.overcommit_memory=0

# Disable THP
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

# I/O: none for NVMe, mq-deadline for SATA
for dev in /sys/block/nvme*; do
  echo none > "$dev/queue/scheduler" 2>/dev/null
done
for dev in /sys/block/sd*; do
  echo mq-deadline > "$dev/queue/scheduler" 2>/dev/null
done

# Network: increase socket buffers
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216

echo "Performance tuning applied."

Validating Your Changes

After tuning, run a real benchmark — not a synthetic one.

# Use your actual workload, but here's a quick stress test
# Install stress-ng for controlled testing
stress-ng --cpu 4 --vm 2 --vm-bytes 1G --io 4 --timeout 60s --metrics-brief

Watch vmstat 1, iostat -xz 1, and mpstat -P ALL 1 in separate terminals during the test. The numbers tell you whether your tuning had the intended effect.

Next up: we dive into the specific kernel parameters (sysctl) that transform a default Linux install into a production-grade server — covering network stack tuning, connection tracking, and file descriptor limits.

Baseline: Know Before You Tune​

CPU Performance Tuning​

CPU Frequency Governors​

CPU Affinity and Isolation​

Memory Optimization​

Swappiness and Dirty Ratios​

Transparent Huge Pages (THP)​

NUMA Awareness​

I/O Optimization​

I/O Schedulers​

Read-Ahead and Queue Depth​

Monitoring I/O in Real Time​

Putting It All Together: A Production Tuning Script​

Validating Your Changes​

Stay Updated