Shell Scripting Series Part 3 — Error Handling, Logging, and Production Scripts
Your scripts work on your machine — here's how to make them production-ready. In Parts 1 and 2, we learned the fundamentals and built real scripts. Now we'll add the guardrails that separate "works on my laptop" from "safe to run in production at 3 AM with nobody watching."
The set Command — Your Safety Net
Every production script should start with set -euo pipefail. Here's what each flag does and why it matters.
#!/bin/bash
set -euo pipefail
# -e : Exit immediately if any command fails
# -u : Treat unset variables as errors (catches typos)
# -o pipefail : A pipeline fails if ANY command in it fails, not just the last one
Let's see why each flag matters.
#!/bin/bash
# WITHOUT set -e: this script continues after the failed command
cd /nonexistent/directory # Fails silently
rm -rf * # Runs in the WRONG directory — disaster!
# WITH set -e: script stops at the first failure
set -e
cd /nonexistent/directory # Script exits here
rm -rf * # Never runs — crisis averted
The pipefail flag catches hidden failures in pipes:
#!/bin/bash
# WITHOUT pipefail:
grep "pattern" huge-file.log | sort | head -5
# If grep fails (file not found), the pipeline exit code is from 'head' (success!)
# WITH pipefail:
set -o pipefail
grep "pattern" huge-file.log | sort | head -5
# Now the pipeline correctly reports grep's failure
| Flag | Without It | With It |
|---|---|---|
-e | Script continues after errors | Script exits on first error |
-u | Unset variables expand to empty string | Unset variables cause immediate exit |
-o pipefail | Pipeline exit code = last command | Pipeline exit code = first failure |
Trap — Cleaning Up After Yourself
trap lets you run cleanup code when your script exits, regardless of whether it succeeded or failed.
#!/bin/bash
set -euo pipefail
# Create a temp directory for working files
WORK_DIR=$(mktemp -d)
LOG_FILE="/var/log/myscript.log"
# Cleanup function — runs no matter how the script exits
cleanup() {
local exit_code=$?
echo "[$(date)] Script exiting with code: $exit_code" >> "$LOG_FILE"
# Remove temp files
if [[ -d "$WORK_DIR" ]]; then
rm -rf "$WORK_DIR"
echo "[$(date)] Cleaned up temp dir: $WORK_DIR" >> "$LOG_FILE"
fi
# Release lock file
[[ -f /tmp/myscript.lock ]] && rm -f /tmp/myscript.lock
exit "$exit_code"
}
# Register the trap — runs on EXIT, ERR, INT (Ctrl+C), TERM (kill)
trap cleanup EXIT
# Now use WORK_DIR safely — it will ALWAYS be cleaned up
echo "Working in: $WORK_DIR"
cp /etc/important-config "$WORK_DIR/"
# ... do processing ...
# Even if the script crashes here, cleanup runs
Trap Signals Reference
| Signal | Triggered By | Common Use |
|---|---|---|
EXIT | Script exits (any reason) | Cleanup temp files |
ERR | A command fails (with set -e) | Log the failure |
INT | User presses Ctrl+C | Graceful shutdown |
TERM | kill command | Graceful shutdown |
HUP | Terminal closes | Reload config |
#!/bin/bash
set -euo pipefail
# Different traps for different situations
trap 'echo "Error on line $LINENO. Command: $BASH_COMMAND"' ERR
trap 'echo "Script interrupted by user"; exit 130' INT
trap 'echo "Script terminated"; exit 143' TERM
echo "Running... press Ctrl+C to test INT trap"
sleep 60
Exit Codes — Communicating Success and Failure
Exit codes are how scripts talk to each other. Zero means success, anything else means failure.
#!/bin/bash
set -euo pipefail
# Define meaningful exit codes
readonly EXIT_SUCCESS=0
readonly EXIT_GENERAL_ERROR=1
readonly EXIT_INVALID_ARGS=2
readonly EXIT_DEPENDENCY_MISSING=3
readonly EXIT_PERMISSION_DENIED=4
readonly EXIT_TIMEOUT=5
check_dependencies() {
local missing=()
for cmd in curl jq aws; do
if ! command -v "$cmd" > /dev/null 2>&1; then
missing+=("$cmd")
fi
done
if [[ ${#missing[@]} -gt 0 ]]; then
echo "ERROR: Missing dependencies: ${missing[*]}"
echo "Install with: sudo apt install ${missing[*]}"
exit $EXIT_DEPENDENCY_MISSING
fi
}
validate_args() {
if [[ $# -lt 2 ]]; then
echo "Usage: $0 <environment> <action>"
echo " environment: dev, staging, production"
echo " action: deploy, rollback, status"
exit $EXIT_INVALID_ARGS
fi
}
validate_args "$@"
check_dependencies
echo "All checks passed"
exit $EXIT_SUCCESS
Logging — Structured Output for Production
Replace scattered echo statements with a proper logging function.
#!/bin/bash
set -euo pipefail
# Logging configuration
LOG_FILE="/var/log/myapp/deploy.log"
LOG_LEVEL="${LOG_LEVEL:-INFO}"
SCRIPT_NAME=$(basename "$0")
# Ensure log directory exists
mkdir -p "$(dirname "$LOG_FILE")"
# Log function with levels and timestamps
log() {
local level="$1"
shift
local message="$*"
local timestamp
timestamp=$(date '+%Y-%m-%d %H:%M:%S')
# Log level filtering
declare -A levels=([DEBUG]=0 [INFO]=1 [WARN]=2 [ERROR]=3 [FATAL]=4)
local current_level=${levels[$LOG_LEVEL]:-1}
local msg_level=${levels[$level]:-1}
[[ $msg_level -lt $current_level ]] && return 0
local log_line="[$timestamp] [$level] [$SCRIPT_NAME:$$] $message"
# Write to file
echo "$log_line" >> "$LOG_FILE"
# Also write to stderr for terminal visibility
case "$level" in
ERROR|FATAL) echo -e "\033[0;31m$log_line\033[0m" >&2 ;;
WARN) echo -e "\033[0;33m$log_line\033[0m" >&2 ;;
INFO) echo "$log_line" >&2 ;;
DEBUG) echo -e "\033[0;36m$log_line\033[0m" >&2 ;;
esac
}
# Usage
log INFO "Deployment starting"
log DEBUG "Environment: production"
log WARN "Disk usage is at 78%"
log ERROR "Connection to database failed"
log INFO "Deployment complete"
File Locking — Prevent Duplicate Runs
When a script runs via cron, you need to ensure only one instance runs at a time. Without locking, overlapping runs can corrupt data.
#!/bin/bash
set -euo pipefail
LOCK_FILE="/tmp/$(basename "$0" .sh).lock"
acquire_lock() {
if [[ -f "$LOCK_FILE" ]]; then
local lock_pid
lock_pid=$(cat "$LOCK_FILE")
# Check if the process that created the lock is still running
if kill -0 "$lock_pid" 2>/dev/null; then
echo "ERROR: Another instance is already running (PID: $lock_pid)"
exit 1
else
echo "WARN: Removing stale lock file (PID $lock_pid is dead)"
rm -f "$LOCK_FILE"
fi
fi
# Create lock with our PID
echo $$ > "$LOCK_FILE"
}
release_lock() {
rm -f "$LOCK_FILE"
}
# Acquire lock and ensure it's released on exit
acquire_lock
trap release_lock EXIT
echo "Running with lock (PID: $$)..."
# Your actual script logic here
sleep 30 # Simulating long-running work
echo "Done!"
A more robust approach uses flock, which handles locking at the kernel level:
#!/bin/bash
set -euo pipefail
LOCK_FILE="/tmp/$(basename "$0" .sh).lock"
# flock -n: non-blocking, exit immediately if can't acquire lock
# flock -E 0: return 0 if lock is already held (instead of 1)
exec 200>"$LOCK_FILE"
if ! flock -n 200; then
echo "Another instance is already running. Exiting."
exit 0
fi
# Lock is held for the duration of the script
echo "Running with flock (PID: $$)..."
# Your script logic here
sleep 30
echo "Done!"
# Lock is automatically released when the script exits
The Production-Grade Template
Here is a complete template that combines everything we've covered. Use this as the starting point for every production script.
#!/bin/bash
#
# Script: production-template.sh
# Purpose: [Describe what this script does]
# Usage: ./production-template.sh <arg1> <arg2>
# Author: Goel Academy
#
set -euo pipefail
IFS=$'\n\t'
# --- Configuration ---
readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
readonly SCRIPT_NAME="$(basename "$0")"
readonly LOG_FILE="/var/log/${SCRIPT_NAME%.sh}.log"
readonly LOCK_FILE="/tmp/${SCRIPT_NAME%.sh}.lock"
readonly WORK_DIR=$(mktemp -d -t "${SCRIPT_NAME%.sh}-XXXXXX")
# --- Logging ---
log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] [$1] $2" | tee -a "$LOG_FILE"; }
info() { log "INFO" "$1"; }
warn() { log "WARN" "$1" >&2; }
error() { log "ERROR" "$1" >&2; }
fatal() { log "FATAL" "$1" >&2; exit 1; }
# --- Cleanup ---
cleanup() {
local exit_code=$?
[[ -d "$WORK_DIR" ]] && rm -rf "$WORK_DIR"
[[ -f "$LOCK_FILE" ]] && rm -f "$LOCK_FILE"
if [[ $exit_code -eq 0 ]]; then
info "Script completed successfully"
else
error "Script failed with exit code: $exit_code"
fi
}
trap cleanup EXIT
trap 'error "Error on line $LINENO: $BASH_COMMAND"; exit 1' ERR
trap 'warn "Interrupted by user"; exit 130' INT TERM
# --- Locking ---
acquire_lock() {
if [[ -f "$LOCK_FILE" ]]; then
local pid
pid=$(cat "$LOCK_FILE")
if kill -0 "$pid" 2>/dev/null; then
fatal "Another instance running (PID: $pid)"
fi
rm -f "$LOCK_FILE"
fi
echo $$ > "$LOCK_FILE"
}
# --- Validation ---
check_root() {
[[ $EUID -eq 0 ]] || fatal "This script must be run as root"
}
check_dependencies() {
local deps=("$@")
for dep in "${deps[@]}"; do
command -v "$dep" > /dev/null 2>&1 || fatal "Missing dependency: $dep"
done
}
usage() {
cat << EOF
Usage: $SCRIPT_NAME [options] <argument>
Options:
-h, --help Show this help
-v, --verbose Enable verbose output
-d, --dry-run Show what would be done without doing it
Arguments:
argument Description of the argument
EOF
exit 0
}
# --- Main ---
main() {
local verbose=false
local dry_run=false
# Parse arguments
while [[ $# -gt 0 ]]; do
case "$1" in
-h|--help) usage ;;
-v|--verbose) verbose=true; shift ;;
-d|--dry-run) dry_run=true; shift ;;
-*) fatal "Unknown option: $1" ;;
*) break ;;
esac
done
acquire_lock
check_dependencies "curl" "jq"
info "Starting $SCRIPT_NAME"
info "Working directory: $WORK_DIR"
info "Verbose: $verbose | Dry run: $dry_run"
# === Your script logic goes here ===
# ===================================
info "All done!"
}
main "$@"
Common Patterns Cheat Sheet
| Pattern | Code |
|---|---|
| Exit on error | set -euo pipefail |
| Cleanup on exit | trap cleanup EXIT |
| Log with timestamp | echo "[$(date)] message" |
| Prevent duplicate runs | flock -n 200 or PID file |
| Require root | [[ $EUID -eq 0 ]] || exit 1 |
| Check dependency | command -v curl > /dev/null 2>&1 |
| Default value | ${VAR:-default} |
| Script directory | $(cd "$(dirname "$0")" && pwd) |
| Temp directory | mktemp -d -t name-XXXXXX |
| Error line number | trap 'echo "line $LINENO"' ERR |
This wraps up our Shell Scripting series. Go back to Part 1 — Variables, Loops, and Functions or Part 2 — Real-World Automation Scripts to review the fundamentals.
