Linux Log Analysis — Find Problems Before They Find You

June 7, 2025 · 7 min read

DevOps & Cloud Learning Hub

The server was acting weird for 3 days before it crashed — the logs told the story all along. Logs are the black box of your servers. If you can read them effectively, you can prevent outages instead of just reacting to them.

The /var/log Directory — Your Starting Point

Every Linux system stores logs in /var/log. Here's what each file tells you.

# See what's in the log directory
ls -lh /var/log/

# Key log files you should know:
# /var/log/syslog      — General system log (Debian/Ubuntu)
# /var/log/messages     — General system log (RHEL/CentOS)
# /var/log/auth.log     — Authentication attempts (SSH, sudo)
# /var/log/kern.log     — Kernel messages
# /var/log/dmesg        — Boot-time hardware messages
# /var/log/apt/         — Package manager logs (Debian)
# /var/log/yum.log      — Package manager logs (RHEL)
# /var/log/nginx/       — Web server logs
# /var/log/cron         — Cron job execution logs

Log File	What It Captures	When to Check
`syslog` / `messages`	System-wide events	General troubleshooting
`auth.log` / `secure`	Login attempts, sudo usage	Security investigations
`kern.log`	Kernel events, driver issues	Hardware problems
`dmesg`	Boot messages, device detection	After hardware changes
`cron`	Cron job output	Failed scheduled tasks

Real-Time Log Monitoring with tail

When troubleshooting a live issue, watching logs in real-time is essential.

# Follow a single log file
tail -f /var/log/syslog

# Follow multiple log files simultaneously
tail -f /var/log/syslog /var/log/auth.log

# Follow and filter — only show lines containing "error"
tail -f /var/log/syslog | grep -i "error"

# Follow the last 100 lines and continue watching
tail -n 100 -f /var/log/nginx/error.log

# Use multitail for a split-screen view (install: apt install multitail)
multitail /var/log/syslog /var/log/auth.log

Journalctl — The Modern Way

If your system uses systemd (most do now), journalctl is the most powerful log tool you have. It queries the systemd journal, which captures structured log data with metadata.

# View all logs (most recent last)
journalctl -e

# Logs for a specific service
journalctl -u nginx.service --no-pager

# Follow mode — real-time log streaming
journalctl -u myapp.service -f

# Logs since a specific time
journalctl --since "2025-06-07 08:00:00" --until "2025-06-07 12:00:00"

# Only errors and above (critical, alert, emergency)
journalctl -p err

# Kernel messages only (replaces dmesg for recent boots)
journalctl -k

# Logs from the previous boot (great after a crash)
journalctl -b -1

# Show logs with extra metadata fields
journalctl -u sshd.service -o verbose

# Disk usage of the journal
journalctl --disk-usage

# Clean up old journal entries (keep only last 2 weeks)
sudo journalctl --vacuum-time=2weeks

Grep Patterns for Log Analysis

Grep is your primary tool for extracting patterns from log files. Here are patterns every DevOps engineer should know.

# Find all SSH failed login attempts
grep "Failed password" /var/log/auth.log

# Count failed logins per IP address
grep "Failed password" /var/log/auth.log | awk '{print $(NF-3)}' | sort | uniq -c | sort -rn | head -10

# Find all errors in the last hour of syslog
awk -v date="$(date -d '1 hour ago' '+%b %d %H')" '$0 >= date' /var/log/syslog | grep -i error

# Extract all unique IP addresses from an Nginx access log
awk '{print $1}' /var/log/nginx/access.log | sort -u | wc -l

# Find HTTP 500 errors in Nginx
awk '$9 == 500' /var/log/nginx/access.log

# Show the top 10 most requested URLs
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

Practical Log Analysis Scenarios

Scenario 1: Who's Brute-Forcing SSH?

# Find the top offending IP addresses
grep "Failed password" /var/log/auth.log \
  | awk '{print $(NF-3)}' \
  | sort \
  | uniq -c \
  | sort -rn \
  | head -20

# Check if any succeeded after failing
for ip in $(grep "Failed password" /var/log/auth.log | awk '{print $(NF-3)}' | sort -u); do
  success=$(grep "Accepted" /var/log/auth.log | grep -c "$ip")
  if [ "$success" -gt 0 ]; then
    echo "WARNING: $ip had failed attempts AND successful login!"
  fi
done

Scenario 2: Why Is the Server Slow?

# Check for OOM (Out of Memory) kills
dmesg | grep -i "oom-killer"
journalctl -k | grep -i "out of memory"

# Check for disk I/O errors
dmesg | grep -i "error\|fail\|i/o"

# Look for services restarting repeatedly
journalctl --since "1 hour ago" | grep -i "started\|stopped\|failed" | sort | uniq -c | sort -rn

# Check load average over time from syslog
grep "load average" /var/log/syslog | tail -20

Scenario 3: Application Crash Investigation

# Find the timeline of events around a crash
journalctl -u myapp.service --since "10 minutes ago" --no-pager

# Check for segfaults or core dumps
journalctl -k | grep -i "segfault\|core dump"

# Find related system events during the same window
journalctl --since "2025-06-07 03:55:00" --until "2025-06-07 04:05:00" -p warning

AWK for Log Processing

AWK is incredibly powerful for log analysis when grep isn't enough.

# Calculate average response time from Nginx (assuming $NF is response time)
awk '{sum += $NF; count++} END {print "Average:", sum/count, "seconds"}' /var/log/nginx/access.log

# Show requests per minute (timestamp in field $4)
awk '{split($4, a, ":"); print a[1]":"a[2]":"a[3]}' /var/log/nginx/access.log \
  | sort | uniq -c | sort -rn | head -20

# Find requests that took longer than 5 seconds
awk '$NF > 5 {print $0}' /var/log/nginx/access.log

# Show traffic by HTTP status code
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn

Logrotate — Managing Log File Size

Without log rotation, log files grow until they fill the disk. Logrotate compresses and archives old logs automatically.

# View the main logrotate config
cat /etc/logrotate.conf

# See application-specific configs
ls /etc/logrotate.d/

# Create a custom logrotate config for your app
sudo tee /etc/logrotate.d/myapp << 'EOF'
/var/log/myapp/*.log {
    daily
    rotate 14
    compress
    delaycompress
    missingok
    notifempty
    create 0640 appuser appuser
    postrotate
        systemctl reload myapp.service > /dev/null 2>&1 || true
    endscript
}
EOF

# Test your logrotate config (dry run — shows what would happen)
sudo logrotate -d /etc/logrotate.d/myapp

# Force rotation right now (useful for testing)
sudo logrotate -f /etc/logrotate.d/myapp

Key logrotate directives:

Directive	Purpose
`daily` / `weekly` / `monthly`	Rotation frequency
`rotate 14`	Keep 14 rotated files
`compress`	Gzip old logs
`delaycompress`	Don't compress the most recent rotated file
`missingok`	Don't error if log file is missing
`notifempty`	Skip rotation if file is empty
`copytruncate`	Truncate in place (for apps that don't reopen files)
`postrotate`	Run command after rotation (e.g., reload service)

Building a Quick Log Monitoring Script

Here's a practical script you can use to get a daily log summary.

#!/bin/bash
# daily-log-summary.sh — Run via cron or systemd timer

echo "=== Daily Log Summary — $(date) ==="
echo ""

echo "--- Failed SSH Logins ---"
grep "Failed password" /var/log/auth.log 2>/dev/null | wc -l
echo ""

echo "--- OOM Events ---"
journalctl -k --since yesterday | grep -c "oom-killer" || echo "0"
echo ""

echo "--- Failed Services ---"
systemctl --failed --no-legend
echo ""

echo "--- Disk Usage ---"
df -h | awk '$5+0 > 80 {print "WARNING:", $0}'
echo ""

echo "--- Top Error Messages ---"
journalctl -p err --since yesterday --no-pager \
  | awk '{$1=$2=$3=""; print}' \
  | sort | uniq -c | sort -rn | head -10

Centralized Logging — The Next Step

Once you manage more than a handful of servers, reading logs on each machine doesn't scale. Here's where centralized logging comes in.

Solution	Type	Best For
rsyslog forwarding	Built-in	Simple setups, forwarding to a central server
ELK Stack	Self-hosted	Full-text search, dashboards, alerting
Grafana Loki	Self-hosted	Lightweight, pairs with Grafana
CloudWatch Logs	Managed (AWS)	AWS-native workloads
Datadog	SaaS	Enterprise with budget

The simplest starting point is forwarding syslog to a central server:

# On the client — add to /etc/rsyslog.conf
# Forward all logs to central server over TCP
*.* @@logserver.internal:514

# Restart rsyslog
sudo systemctl restart rsyslog

Next in our Linux series: Linux Firewalls — learn iptables, nftables, and UFW to lock down your servers and control network traffic like a pro.

The /var/log Directory — Your Starting Point​

Real-Time Log Monitoring with tail​

Journalctl — The Modern Way​

Grep Patterns for Log Analysis​

Practical Log Analysis Scenarios​

Scenario 1: Who's Brute-Forcing SSH?​

Scenario 2: Why Is the Server Slow?​

Scenario 3: Application Crash Investigation​

AWK for Log Processing​

Logrotate — Managing Log File Size​

Building a Quick Log Monitoring Script​

Centralized Logging — The Next Step​

Stay Updated