Skip to main content

sed, awk, grep — The Holy Trinity of Linux Text Processing

· 8 min read
Goel Academy
DevOps & Cloud Learning Hub

Parse a 10GB log file in seconds — no Python needed. These three commands — grep, sed, and awk — are the most powerful text processing tools in Linux. Master them and you'll handle log analysis, data transformation, and configuration management faster than any scripting language.

grep — Finding Needles in Haystacks

grep searches for patterns in text. It's fast, it's everywhere, and it supports full regular expressions.

Basic grep

# Search for a string in a file
grep "error" /var/log/syslog

# Case-insensitive search
grep -i "error" /var/log/syslog

# Show line numbers
grep -n "error" /var/log/syslog

# Show 3 lines before and after each match (context)
grep -C 3 "error" /var/log/syslog

# Search recursively in a directory
grep -r "password" /etc/ 2>/dev/null

# Only show filenames (not matching lines)
grep -rl "TODO" /opt/myapp/src/

# Invert match — show lines that DON'T contain the pattern
grep -v "DEBUG" /var/log/app.log

# Count matches
grep -c "404" /var/log/nginx/access.log

grep with Regular Expressions

# Extended regex (-E) — match IP addresses
grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" /var/log/auth.log

# Match lines starting with a date
grep -E "^2025-" /var/log/app.log

# Match lines ending with "failed"
grep -E "failed$" /var/log/auth.log

# Match either "error" or "warning"
grep -E "error|warning" /var/log/syslog

# Perl-compatible regex (-P) — match email addresses
grep -P "[\w.]+@[\w.]+" /etc/aliases

# Match a word boundary (whole words only)
grep -w "root" /etc/passwd

# Multiple patterns from a file
echo -e "error\nfailed\ntimeout" > /tmp/patterns.txt
grep -f /tmp/patterns.txt /var/log/syslog

Practical grep Pipelines

# Find the top 10 IP addresses hitting your server
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

# Find all unique error messages
grep -i "error" /var/log/app.log | sort -u | head -20

# Find processes using the most memory
ps aux | grep -v "grep" | sort -k4 -rn | head -10

# Search compressed log files
zgrep "error" /var/log/syslog.*.gz

sed — Stream Editor for Transformations

sed processes text line by line — it can find, replace, delete, and insert text. Think of it as a programmable find-and-replace.

Basic Substitution

# Replace first occurrence on each line
sed 's/old/new/' file.txt

# Replace ALL occurrences on each line (global flag)
sed 's/old/new/g' file.txt

# Case-insensitive replacement
sed 's/error/WARNING/gi' file.txt

# Edit file in-place (modifies the actual file)
sed -i 's/old-server/new-server/g' config.yml

# Edit in-place with backup
sed -i.bak 's/old-server/new-server/g' config.yml

Advanced sed Operations

# Delete lines matching a pattern
sed '/^#/d' config.conf # Remove comment lines
sed '/^$/d' config.conf # Remove empty lines
sed '/^#/d; /^$/d' config.conf # Remove both

# Delete a specific line number
sed '5d' file.txt # Delete line 5
sed '10,20d' file.txt # Delete lines 10-20

# Print only matching lines (like grep)
sed -n '/error/p' /var/log/app.log

# Insert text before/after a line
sed '/\[database\]/a host = db-server.internal' config.ini # After
sed '/\[database\]/i # Database Configuration' config.ini # Before

# Replace between line ranges
sed '10,20s/foo/bar/g' file.txt

# Multiple operations
sed -e 's/foo/bar/g' -e 's/baz/qux/g' -e '/^#/d' file.txt

# Extract text between two patterns
sed -n '/START/,/END/p' logfile.txt

Practical sed Examples

# Update a config value in a properties file
sed -i 's/^database.host=.*/database.host=new-db.internal/' app.properties

# Add a line after a pattern (e.g., add env variable to a script)
sed -i '/^#!/a export PATH="/opt/myapp/bin:$PATH"' deploy.sh

# Comment out a line
sed -i 's/^dangerous_setting/#dangerous_setting/' config.conf

# Remove trailing whitespace from all lines
sed -i 's/[[:space:]]*$//' script.sh

# Convert Windows line endings (CRLF) to Unix (LF)
sed -i 's/\r$//' script.sh

# Extract version number from a file
version=$(sed -n 's/.*version.*"\([0-9.]*\)".*/\1/p' package.json)
echo "Version: $version"

awk — The Programmable Text Processor

awk is a full programming language designed for text processing. It splits each line into fields and lets you process them with patterns and actions.

awk Basics

# Print specific fields (awk splits on whitespace by default)
awk '{print $1}' file.txt # First field
awk '{print $1, $3}' file.txt # First and third fields
awk '{print $NF}' file.txt # Last field
awk '{print $(NF-1)}' file.txt # Second-to-last field

# Custom field separator
awk -F: '{print $1, $7}' /etc/passwd # Username and shell
awk -F, '{print $2}' data.csv # Second column of CSV

# Print with custom formatting
awk -F: '{printf "%-20s %s\n", $1, $7}' /etc/passwd

awk Patterns and Conditions

# Only process lines matching a pattern
awk '/error/ {print $0}' /var/log/app.log

# Conditional logic
awk -F: '$3 >= 1000 {print $1, "UID="$3}' /etc/passwd # Regular users only

# Multiple conditions
awk -F: '$3 >= 1000 && $7 != "/usr/sbin/nologin" {print $1}' /etc/passwd

# NR = line number, use it to skip headers
awk 'NR > 1 {print $0}' data.csv

# Process only lines 10-20
awk 'NR >= 10 && NR <= 20' file.txt

awk BEGIN and END Blocks

BEGIN runs before processing any input. END runs after all lines are processed.

# Calculate average response time from a log
awk '{sum += $NF; count++} END {print "Average:", sum/count, "ms"}' response-times.log

# Sum a column in a CSV
awk -F, 'NR > 1 {sum += $3} END {print "Total: $"sum}' sales.csv

# Count occurrences of each HTTP status code
awk '{count[$9]++} END {for (code in count) print code, count[code]}' /var/log/nginx/access.log | sort -k2 -rn

# Formatted report with header and footer
awk -F: 'BEGIN {
printf "%-20s %-8s %s\n", "USERNAME", "UID", "SHELL"
printf "%-20s %-8s %s\n", "--------", "---", "-----"
}
$3 >= 1000 && $3 < 65534 {
printf "%-20s %-8s %s\n", $1, $3, $7
count++
}
END {
print ""
print "Total users:", count
}' /etc/passwd

Supporting Cast — cut, sort, uniq, tr

These smaller tools combine beautifully with grep, sed, and awk.

# cut — extract specific columns
cut -d: -f1,7 /etc/passwd # Fields 1 and 7, colon-delimited
cut -c1-10 file.txt # First 10 characters of each line

# sort — sort lines
sort file.txt # Alphabetical
sort -n file.txt # Numeric
sort -k3 -rn file.txt # Sort by 3rd column, numeric, descending
sort -t: -k3 -n /etc/passwd # Sort passwd by UID

# uniq — remove consecutive duplicates (ALWAYS sort first)
sort file.txt | uniq # Remove duplicates
sort file.txt | uniq -c # Count occurrences
sort file.txt | uniq -d # Show only duplicates

# tr — translate/delete characters
echo "HELLO" | tr 'A-Z' 'a-z' # Lowercase
echo "hello world" | tr ' ' '_' # Replace spaces with underscores
echo "extra spaces" | tr -s ' ' # Squeeze multiple spaces
cat file.txt | tr -d '\r' # Remove carriage returns

Real-World Pipeline Examples

Here's where the magic happens — combining these tools into powerful pipelines.

# Pipeline 1: Top 10 IPs with the most 404 errors
awk '$9 == 404 {print $1}' /var/log/nginx/access.log \
| sort | uniq -c | sort -rn | head -10

# Pipeline 2: Disk usage report — only filesystems over 50%
df -h | awk 'NR>1 && $5+0 > 50 {printf "%-30s %s used (%s available)\n", $6, $5, $4}'

# Pipeline 3: Find config files changed in the last 24 hours
find /etc -name "*.conf" -mtime -1 -exec ls -la {} \; 2>/dev/null \
| awk '{print $6, $7, $8, $9}' | sort

# Pipeline 4: Extract and count error types from a Java log
grep "Exception" /var/log/app/app.log \
| sed 's/.*\(.*Exception\).*/\1/' \
| sort | uniq -c | sort -rn

# Pipeline 5: Generate a CSV from /etc/passwd
awk -F: 'BEGIN {print "username,uid,gid,home,shell"}
$3 >= 1000 && $3 < 65534 {
print $1","$3","$4","$6","$7
}' /etc/passwd > users.csv

# Pipeline 6: Monitor a log file and alert on error rate
tail -f /var/log/app.log | awk '
/ERROR/ {errors++; total++; next}
{total++}
total % 100 == 0 {
rate = (errors/total) * 100
if (rate > 5) printf "[ALERT] Error rate: %.1f%% (%d/%d)\n", rate, errors, total
}'

Quick Reference Table

TaskCommand
Find a patterngrep "pattern" file
Find and replacesed 's/old/new/g' file
Extract a columnawk '{print $3}' file
Count occurrencesgrep -c "pattern" file
Remove duplicatessort file | uniq
Top N itemssort | uniq -c | sort -rn | head -N
Sum a columnawk '{sum+=$1} END {print sum}'
Delete matching linessed '/pattern/d' file
Replace field separatortr ',' '\t'
In-place file editsed -i 's/old/new/g' file

With text processing mastered, you now have the full Linux toolkit for DevOps work. Check out our other posts on Linux Disk Management, Systemd Services, and the complete Shell Scripting Series to round out your Linux skills.