Linux Process Management — ps, top, kill and Beyond

March 8, 2025 · 7 min read

DevOps & Cloud Learning Hub

It's 3 AM. Your pager goes off. The production server is crawling. CPU is at 100%. Memory is gone. Something is eating your server alive, and you need to find it and stop it — fast. Knowing how to manage Linux processes isn't optional for a DevOps engineer; it's survival.

Understanding Processes — PIDs, Parents, and States

Every running program in Linux is a process with a unique PID (Process ID). Processes form a tree — every process has a parent (except PID 1, systemd).

# See the full process tree
pstree -p
# systemd(1)─┬─sshd(1234)───sshd(5678)───bash(5680)───vim(5700)
#             ├─nginx(2000)─┬─nginx(2001)
#             │             └─nginx(2002)
#             └─dockerd(3000)───containerd(3001)

# What's PID 1 on your system?
ps -p 1 -o comm=
# systemd

Processes live in different states:

State	Code	Meaning
Running	`R`	Actively using CPU
Sleeping	`S`	Waiting for I/O or event
Disk Sleep	`D`	Uninterruptible I/O wait
Stopped	`T`	Paused (Ctrl+Z or debugger)
Zombie	`Z`	Finished but parent hasn't collected exit status

ps — Snapshot of Running Processes

The ps command is your first tool for investigation. There are two major syntax styles — BSD and UNIX. Most DevOps engineers use BSD style.

# The classic: show all processes with details
ps aux
# USER   PID %CPU %MEM    VSZ   RSS TTY  STAT START   TIME COMMAND
# root     1  0.0  0.1 169432 13256 ?    Ss   Feb24   0:05 /usr/lib/systemd/systemd
# www-data 2100 85.3 12.4 1024000 512000 ? R  03:01  42:15 /usr/bin/php-fpm

# Find a specific process
ps aux | grep nginx
# root   2000  0.0  0.1  65432  5432 ?  Ss  Feb24  0:00 nginx: master
# www    2001  0.2  0.5  72000 20480 ?  S   Feb24  1:12 nginx: worker

# Show processes in tree format
ps auxf

# Show specific columns only
ps -eo pid,ppid,user,%cpu,%mem,stat,start,time,comm --sort=-%cpu | head -20

# Find the top 5 memory consumers
ps aux --sort=-%mem | head -6

That --sort=-%cpu flag is gold when you're hunting a CPU hog. The minus sign means descending order.

top and htop — Real-Time Monitoring

While ps gives you a snapshot, top gives you a live view.

# Basic top — press 'q' to quit
top

# Useful top shortcuts:
# P — Sort by CPU usage
# M — Sort by memory usage
# k — Kill a process (enter PID)
# c — Toggle full command path
# 1 — Show individual CPU cores
# H — Show threads

# Run top in batch mode for scripting
top -b -n 1 | head -20

# Filter top to show only one user's processes
top -u www-data

For a much better experience, use htop:

# Install htop
sudo apt install htop    # Debian/Ubuntu
sudo dnf install htop    # RHEL/Fedora

# Launch htop
htop

# htop advantages over top:
# - Color-coded CPU/memory bars
# - Mouse support (click to sort, select)
# - Tree view (F5)
# - Search processes (F3)
# - Filter processes (F4)
# - Kill with signal selection (F9)

Kill Signals — Asking vs Telling vs Forcing

Killing a process isn't just kill -9. There are different signals for different situations:

Signal	Number	Meaning	Use When
`SIGHUP`	1	Hangup / reload config	Reload Nginx, Apache without restart
`SIGINT`	2	Interrupt (Ctrl+C)	Graceful stop from terminal
`SIGQUIT`	3	Quit with core dump	Debugging crashes
`SIGTERM`	15	Terminate gracefully	Default kill — try this first
`SIGKILL`	9	Force kill immediately	Last resort only
`SIGSTOP`	19	Pause process	Freeze a runaway process temporarily
`SIGCONT`	18	Resume paused process	Continue after SIGSTOP

# Always try graceful termination first
kill 2100          # Sends SIGTERM (default)
kill -15 2100      # Same thing, explicit

# Wait a few seconds. Still running?
kill -9 2100       # Nuclear option — SIGKILL

# Kill by name instead of PID
pkill -f "php-fpm"
killall nginx

# Reload config without restarting (Nginx, Apache, HAProxy)
kill -HUP $(cat /var/run/nginx.pid)
# or
sudo systemctl reload nginx

# Pause a runaway process while you investigate
kill -STOP 2100    # Freeze it
# ... investigate the issue ...
kill -CONT 2100    # Resume it
# or
kill -9 2100       # Kill it

Rule of thumb: Always try SIGTERM (15) first. Give the process 5-10 seconds to clean up. Only use SIGKILL (9) if it refuses to stop. Why? SIGKILL can't be caught — the process dies immediately without cleaning up temp files, closing database connections, or flushing buffers.

nice and renice — Process Priority

Linux uses priority values from -20 (highest priority) to 19 (lowest priority). Default is 0.

# Start a CPU-heavy backup with low priority
nice -n 19 tar -czf /backup/full-backup.tar.gz /var/www/

# Check a process's nice value
ps -o pid,ni,comm -p 2100

# Change priority of a running process
sudo renice -n 10 -p 2100      # Lower priority
sudo renice -n -5 -p 2100      # Higher priority (needs root)

# Renice all processes of a user
sudo renice -n 15 -u jenkins

This is incredibly useful during business hours. Need to run a big backup or log analysis? Set it to nice 19 so it doesn't affect production traffic.

Background Jobs and nohup

You SSH into a server, start a long-running task, and your connection drops. The process dies. Here's how to prevent that.

# Run in background with &
./long-migration.sh &
# [1] 12345

# But it still dies when you disconnect! Use nohup:
nohup ./long-migration.sh > migration.log 2>&1 &

# Or use disown to detach an already-running job
./long-migration.sh
# Ctrl+Z to pause
bg                  # Resume in background
disown -h %1        # Detach from terminal

# Job control commands
jobs                # List background jobs
fg %1               # Bring job 1 to foreground
bg %1               # Send job 1 to background

For production, use screen or tmux instead:

# Start a tmux session
tmux new -s migration

# Run your command
./long-migration.sh

# Detach: Ctrl+B, then D
# Reconnect later:
tmux attach -t migration

Zombie Processes — The Undead

A zombie process has finished executing but its parent hasn't called wait() to collect its exit status. Zombies don't consume CPU or memory, but they do consume a PID.

# Find zombie processes
ps aux | awk '$8 ~ /Z/ {print}'

# Count zombies
ps aux | awk '$8 ~ /Z/' | wc -l

# Find the parent of a zombie
ps -o pid,ppid,stat,comm -p $(ps aux | awk '$8 ~ /Z/ {print $2}')

# You can't kill a zombie — it's already dead!
# Kill the parent process instead to clean them up
kill -SIGCHLD <parent_pid>    # Ask parent to reap
kill <parent_pid>              # Kill parent if it won't

A handful of zombies is normal. Hundreds of zombies means the parent process has a bug — it's not handling child process exits properly.

The /proc Filesystem — Process X-Ray

Every process gets a directory under /proc/<PID>/ with detailed info. This is where ps and top actually get their data.

# Pick a process to investigate
PID=2100

# What command started it?
cat /proc/$PID/cmdline | tr '\0' ' '

# What environment variables does it see?
cat /proc/$PID/environ | tr '\0' '\n' | sort

# What files does it have open?
ls -la /proc/$PID/fd/ | head -20

# How much memory is it really using?
cat /proc/$PID/status | grep -i vm

# What's its current working directory?
ls -la /proc/$PID/cwd

# What binary is it running?
ls -la /proc/$PID/exe

# System-wide info
cat /proc/loadavg       # Load averages
cat /proc/meminfo        # Memory details
cat /proc/cpuinfo        # CPU details

Real-World Scenario: Hunt the CPU Hog

Here's the complete workflow for that 3 AM incident:

# Step 1: Quick overview — what's the load?
uptime
# 03:15:00 up 45 days, load average: 12.50, 8.30, 3.10
# Load is 12.5 on a 4-core machine — that's bad

# Step 2: Find the top CPU consumers
ps aux --sort=-%cpu | head -5

# Step 3: Get more details on the offending process
PID=2100
ls -la /proc/$PID/exe
cat /proc/$PID/cmdline | tr '\0' ' '

# Step 4: Check how long it's been running
ps -o pid,etime,pcpu,pmem,comm -p $PID

# Step 5: Freeze it while you investigate
kill -STOP $PID

# Step 6: Decide — restart gracefully or force kill
kill -TERM $PID       # Try graceful first
sleep 5
ps -p $PID > /dev/null 2>&1 && kill -9 $PID  # Force if still alive

# Step 7: Verify system recovered
uptime
free -h

Next up: apt vs yum vs dnf — Linux Package Managers Demystified — which one should you use? Here's the real difference.

Understanding Processes — PIDs, Parents, and States​

ps — Snapshot of Running Processes​

top and htop — Real-Time Monitoring​

Kill Signals — Asking vs Telling vs Forcing​

nice and renice — Process Priority​

Background Jobs and nohup​

Zombie Processes — The Undead​

The /proc Filesystem — Process X-Ray​

Real-World Scenario: Hunt the CPU Hog​

Stay Updated