Linux Log Analysis — Find Problems Before They Find You
The server was acting weird for 3 days before it crashed — the logs told the story all along. Logs are the black box of your servers. If you can read them effectively, you can prevent outages instead of just reacting to them.
The /var/log Directory — Your Starting Point
Every Linux system stores logs in /var/log. Here's what each file tells you.
# See what's in the log directory
ls -lh /var/log/
# Key log files you should know:
# /var/log/syslog — General system log (Debian/Ubuntu)
# /var/log/messages — General system log (RHEL/CentOS)
# /var/log/auth.log — Authentication attempts (SSH, sudo)
# /var/log/kern.log — Kernel messages
# /var/log/dmesg — Boot-time hardware messages
# /var/log/apt/ — Package manager logs (Debian)
# /var/log/yum.log — Package manager logs (RHEL)
# /var/log/nginx/ — Web server logs
# /var/log/cron — Cron job execution logs
| Log File | What It Captures | When to Check |
|---|---|---|
syslog / messages | System-wide events | General troubleshooting |
auth.log / secure | Login attempts, sudo usage | Security investigations |
kern.log | Kernel events, driver issues | Hardware problems |
dmesg | Boot messages, device detection | After hardware changes |
cron | Cron job output | Failed scheduled tasks |
Real-Time Log Monitoring with tail
When troubleshooting a live issue, watching logs in real-time is essential.
# Follow a single log file
tail -f /var/log/syslog
# Follow multiple log files simultaneously
tail -f /var/log/syslog /var/log/auth.log
# Follow and filter — only show lines containing "error"
tail -f /var/log/syslog | grep -i "error"
# Follow the last 100 lines and continue watching
tail -n 100 -f /var/log/nginx/error.log
# Use multitail for a split-screen view (install: apt install multitail)
multitail /var/log/syslog /var/log/auth.log
Journalctl — The Modern Way
If your system uses systemd (most do now), journalctl is the most powerful log tool you have. It queries the systemd journal, which captures structured log data with metadata.
# View all logs (most recent last)
journalctl -e
# Logs for a specific service
journalctl -u nginx.service --no-pager
# Follow mode — real-time log streaming
journalctl -u myapp.service -f
# Logs since a specific time
journalctl --since "2025-06-07 08:00:00" --until "2025-06-07 12:00:00"
# Only errors and above (critical, alert, emergency)
journalctl -p err
# Kernel messages only (replaces dmesg for recent boots)
journalctl -k
# Logs from the previous boot (great after a crash)
journalctl -b -1
# Show logs with extra metadata fields
journalctl -u sshd.service -o verbose
# Disk usage of the journal
journalctl --disk-usage
# Clean up old journal entries (keep only last 2 weeks)
sudo journalctl --vacuum-time=2weeks
Grep Patterns for Log Analysis
Grep is your primary tool for extracting patterns from log files. Here are patterns every DevOps engineer should know.
# Find all SSH failed login attempts
grep "Failed password" /var/log/auth.log
# Count failed logins per IP address
grep "Failed password" /var/log/auth.log | awk '{print $(NF-3)}' | sort | uniq -c | sort -rn | head -10
# Find all errors in the last hour of syslog
awk -v date="$(date -d '1 hour ago' '+%b %d %H')" '$0 >= date' /var/log/syslog | grep -i error
# Extract all unique IP addresses from an Nginx access log
awk '{print $1}' /var/log/nginx/access.log | sort -u | wc -l
# Find HTTP 500 errors in Nginx
awk '$9 == 500' /var/log/nginx/access.log
# Show the top 10 most requested URLs
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10
Practical Log Analysis Scenarios
Scenario 1: Who's Brute-Forcing SSH?
# Find the top offending IP addresses
grep "Failed password" /var/log/auth.log \
| awk '{print $(NF-3)}' \
| sort \
| uniq -c \
| sort -rn \
| head -20
# Check if any succeeded after failing
for ip in $(grep "Failed password" /var/log/auth.log | awk '{print $(NF-3)}' | sort -u); do
success=$(grep "Accepted" /var/log/auth.log | grep -c "$ip")
if [ "$success" -gt 0 ]; then
echo "WARNING: $ip had failed attempts AND successful login!"
fi
done
Scenario 2: Why Is the Server Slow?
# Check for OOM (Out of Memory) kills
dmesg | grep -i "oom-killer"
journalctl -k | grep -i "out of memory"
# Check for disk I/O errors
dmesg | grep -i "error\|fail\|i/o"
# Look for services restarting repeatedly
journalctl --since "1 hour ago" | grep -i "started\|stopped\|failed" | sort | uniq -c | sort -rn
# Check load average over time from syslog
grep "load average" /var/log/syslog | tail -20
Scenario 3: Application Crash Investigation
# Find the timeline of events around a crash
journalctl -u myapp.service --since "10 minutes ago" --no-pager
# Check for segfaults or core dumps
journalctl -k | grep -i "segfault\|core dump"
# Find related system events during the same window
journalctl --since "2025-06-07 03:55:00" --until "2025-06-07 04:05:00" -p warning
AWK for Log Processing
AWK is incredibly powerful for log analysis when grep isn't enough.
# Calculate average response time from Nginx (assuming $NF is response time)
awk '{sum += $NF; count++} END {print "Average:", sum/count, "seconds"}' /var/log/nginx/access.log
# Show requests per minute (timestamp in field $4)
awk '{split($4, a, ":"); print a[1]":"a[2]":"a[3]}' /var/log/nginx/access.log \
| sort | uniq -c | sort -rn | head -20
# Find requests that took longer than 5 seconds
awk '$NF > 5 {print $0}' /var/log/nginx/access.log
# Show traffic by HTTP status code
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn
Logrotate — Managing Log File Size
Without log rotation, log files grow until they fill the disk. Logrotate compresses and archives old logs automatically.
# View the main logrotate config
cat /etc/logrotate.conf
# See application-specific configs
ls /etc/logrotate.d/
# Create a custom logrotate config for your app
sudo tee /etc/logrotate.d/myapp << 'EOF'
/var/log/myapp/*.log {
daily
rotate 14
compress
delaycompress
missingok
notifempty
create 0640 appuser appuser
postrotate
systemctl reload myapp.service > /dev/null 2>&1 || true
endscript
}
EOF
# Test your logrotate config (dry run — shows what would happen)
sudo logrotate -d /etc/logrotate.d/myapp
# Force rotation right now (useful for testing)
sudo logrotate -f /etc/logrotate.d/myapp
Key logrotate directives:
| Directive | Purpose |
|---|---|
daily / weekly / monthly | Rotation frequency |
rotate 14 | Keep 14 rotated files |
compress | Gzip old logs |
delaycompress | Don't compress the most recent rotated file |
missingok | Don't error if log file is missing |
notifempty | Skip rotation if file is empty |
copytruncate | Truncate in place (for apps that don't reopen files) |
postrotate | Run command after rotation (e.g., reload service) |
Building a Quick Log Monitoring Script
Here's a practical script you can use to get a daily log summary.
#!/bin/bash
# daily-log-summary.sh — Run via cron or systemd timer
echo "=== Daily Log Summary — $(date) ==="
echo ""
echo "--- Failed SSH Logins ---"
grep "Failed password" /var/log/auth.log 2>/dev/null | wc -l
echo ""
echo "--- OOM Events ---"
journalctl -k --since yesterday | grep -c "oom-killer" || echo "0"
echo ""
echo "--- Failed Services ---"
systemctl --failed --no-legend
echo ""
echo "--- Disk Usage ---"
df -h | awk '$5+0 > 80 {print "WARNING:", $0}'
echo ""
echo "--- Top Error Messages ---"
journalctl -p err --since yesterday --no-pager \
| awk '{$1=$2=$3=""; print}' \
| sort | uniq -c | sort -rn | head -10
Centralized Logging — The Next Step
Once you manage more than a handful of servers, reading logs on each machine doesn't scale. Here's where centralized logging comes in.
| Solution | Type | Best For |
|---|---|---|
| rsyslog forwarding | Built-in | Simple setups, forwarding to a central server |
| ELK Stack | Self-hosted | Full-text search, dashboards, alerting |
| Grafana Loki | Self-hosted | Lightweight, pairs with Grafana |
| CloudWatch Logs | Managed (AWS) | AWS-native workloads |
| Datadog | SaaS | Enterprise with budget |
The simplest starting point is forwarding syslog to a central server:
# On the client — add to /etc/rsyslog.conf
# Forward all logs to central server over TCP
*.* @@logserver.internal:514
# Restart rsyslog
sudo systemctl restart rsyslog
Next in our Linux series: Linux Firewalls — learn iptables, nftables, and UFW to lock down your servers and control network traffic like a pro.
