Top 50 Linux Interview Questions for DevOps & SRE Roles
These are the actual questions asked at Amazon, Google, and top startups. Not "what is Linux?" fluff -- real questions that test whether you can operate production systems under pressure. Each answer is concise enough to give in an interview, with the exact command or concept you need.
Beginner Level (1-15)
1. How do you find all files larger than 100MB on a system?
find / -type f -size +100M -exec ls -lh {} \; 2>/dev/null
2. What's the difference between a hard link and a soft link?
A hard link is another directory entry pointing to the same inode -- deleting the original file doesn't affect it. A soft (symbolic) link points to the file path -- if the original is deleted, the symlink breaks.
ln file.txt hardlink.txt # hard link (same inode)
ln -s file.txt softlink.txt # symbolic link (points to path)
ls -li file.txt hardlink.txt # same inode number
3. How do you check disk usage by directory?
du -sh /var/* | sort -rh | head -10
df -h # filesystem-level usage
4. Explain Linux file permissions: what does chmod 755 mean?
755 = owner (rwx=7), group (r-x=5), others (r-x=5). Owner can read, write, execute. Group and others can read and execute but not write.
chmod 755 script.sh # rwxr-xr-x
chmod u+x script.sh # add execute for owner only
stat -c "%a %n" * # show numeric permissions
5. How do you find which process is using a specific port?
ss -tlnp | grep :8080
lsof -i :8080
sudo netstat -tlnp | grep :8080
6. What's the difference between > and >> in bash?
> overwrites the file. >> appends to the file.
echo "first" > file.txt # file contains "first"
echo "second" > file.txt # file contains "second" (overwritten)
echo "third" >> file.txt # file contains "second\nthird"
7. How do you check system memory usage?
free -h # quick overview
cat /proc/meminfo # detailed breakdown
vmstat 1 5 # memory stats every 1 second, 5 times
8. What does the /etc/fstab file do?
It defines filesystems to mount automatically at boot. Each line specifies: device, mount point, filesystem type, mount options, dump flag, and fsck order.
9. How do you schedule a task to run every day at 2:30 AM?
crontab -e
# Add: 30 2 * * * /path/to/script.sh
# Format: minute hour day-of-month month day-of-week command
10. How do you find a string in all files recursively?
grep -rn "search_term" /var/log/
grep -rl "ERROR" /var/log/ --include="*.log"
11. What is the difference between kill, kill -9, and kill -15?
kill (default SIGTERM/15) asks the process to terminate gracefully. kill -9 (SIGKILL) forces immediate termination -- the process cannot catch or ignore it. Always try SIGTERM first.
12. How do you check which services are running?
systemctl list-units --type=service --state=running
systemctl status nginx
13. What is an inode?
An inode is a data structure that stores metadata about a file (permissions, ownership, timestamps, block locations) -- everything except the filename and content. The filename is stored in the directory entry which maps the name to an inode number.
14. How do you add a user to a group without removing existing groups?
sudo usermod -aG docker username # -a = append, -G = supplementary group
id username # verify groups
15. What does top show and what are the key columns?
top shows real-time system resource usage. Key columns: PID (process ID), USER, %CPU, %MEM, VIRT (virtual memory), RES (resident/physical memory), COMMAND. Press 1 to see per-CPU stats, M to sort by memory.
Intermediate Level (16-35)
16. How do you troubleshoot a server running out of disk space?
df -h # which filesystem is full
du -sh /* 2>/dev/null | sort -rh | head # largest directories
lsof +L1 # deleted files still held open
find /var/log -name "*.log" -size +1G # large log files
journalctl --vacuum-size=100M # trim systemd journal
17. Explain the boot process of a Linux system.
BIOS/UEFI -> bootloader (GRUB) -> kernel loads -> initramfs mounts temporary root -> kernel mounts real root filesystem -> init/systemd starts (PID 1) -> systemd starts services based on targets -> login prompt.
18. What is the difference between a process and a thread?
A process has its own memory space, file descriptors, and PID. A thread shares memory and resources within the same process. Threads are lighter to create and switch between but share state (which can cause race conditions).
19. How do you trace system calls made by a process?
strace -p <PID> # attach to running process
strace -f -e trace=network ./app # trace network calls including child processes
strace -c ./app # summary of syscall counts and times
20. What is swap and when is it used?
Swap is disk space used as overflow when physical RAM is full. The kernel moves inactive memory pages to swap (swapping out) to free RAM for active processes.
swapon --show # current swap usage
cat /proc/sys/vm/swappiness # how aggressively kernel swaps (0-100)
sudo sysctl vm.swappiness=10 # reduce swapping (prefer RAM)
21. How do you find zombie processes and clean them up?
ps aux | awk '$8 == "Z" {print}'
# Zombies can't be killed directly — kill the PARENT process
ps -eo pid,ppid,stat,cmd | grep Z
kill <parent_pid>
22. Explain TCP three-way handshake.
Client sends SYN -> Server responds SYN-ACK -> Client sends ACK. Connection is now ESTABLISHED. Termination uses FIN -> ACK -> FIN -> ACK (four-way).
23. How do you check network connectivity step by step?
ping 8.8.8.8 # Layer 3 (IP connectivity)
ping google.com # DNS resolution works?
traceroute google.com # where does the path break?
curl -v https://api.example.com # Layer 7 (HTTP connectivity)
ss -tlnp # is the service listening?
24. What is the difference between systemctl and service?
systemctl is the systemd-native tool with full control (enable, mask, show dependencies, journal integration). service is a legacy SysV init wrapper. Modern systems should use systemctl.
25. How do you analyze a core dump?
# Enable core dumps
ulimit -c unlimited
echo "/tmp/core.%e.%p" > /proc/sys/kernel/core_pattern
# Analyze with gdb
gdb /usr/bin/myapp /tmp/core.myapp.12345
# In gdb: bt (backtrace), info threads, thread apply all bt
26. What is the difference between /dev/null, /dev/zero, and /dev/urandom?
/dev/null discards all input (black hole). /dev/zero produces infinite null bytes (used for zeroing disks). /dev/urandom produces random bytes (used for encryption keys, passwords).
27. How do you check and repair a filesystem?
# MUST unmount first (or use in rescue/single-user mode)
sudo umount /dev/sdb1
sudo fsck -y /dev/sdb1
sudo e2fsck -f /dev/sdb1 # ext4-specific
28. How do you set up passwordless SSH?
ssh-keygen -t ed25519 -C "user@host"
ssh-copy-id user@remote-server
ssh user@remote-server # should not prompt for password
29. What is the OOM killer and how do you configure it?
When the system runs out of memory, the kernel's OOM (Out of Memory) killer selects and terminates a process to free RAM. It scores processes by memory usage.
# Check OOM score of a process
cat /proc/<PID>/oom_score
# Protect a critical process from OOM killing (-1000 to 1000)
echo -1000 > /proc/<PID>/oom_score_adj
# Check if OOM killer has acted
dmesg | grep -i "oom\|out of memory"
30. How do you capture and analyze network traffic?
sudo tcpdump -i eth0 port 443 -w capture.pcap
sudo tcpdump -i any host 10.0.1.5 -n
sudo tcpdump -i eth0 'tcp[tcpflags] & (tcp-syn) != 0' -c 100
31. What is the sticky bit?
The sticky bit on a directory means only the file owner (or root) can delete files in that directory, even if others have write permission. /tmp has the sticky bit set.
ls -ld /tmp # drwxrwxrwt — the "t" is the sticky bit
chmod +t /shared
32. Explain the difference between nice and renice.
nice launches a process with a modified scheduling priority (-20 highest to 19 lowest). renice changes the priority of an already running process.
nice -n 10 ./heavy-job.sh # start with lower priority
sudo renice -n -5 -p <PID> # increase priority of running process
33. How do you mount an NFS share?
sudo apt install nfs-common
sudo mount -t nfs server:/export/data /mnt/nfs
# Permanent mount in fstab:
# server:/export/data /mnt/nfs nfs defaults,_netdev 0 0
34. What does ulimit control?
ulimit sets per-user resource limits: open files, max processes, core dump size, stack size, virtual memory. Critical for production services.
ulimit -a # show all limits
ulimit -n 65535 # increase open file limit (current shell)
# Permanent: edit /etc/security/limits.conf
35. How do you view and manage systemd logs?
journalctl -u nginx --since "1 hour ago"
journalctl -u nginx -f # follow (like tail -f)
journalctl -p err --since today # only errors
journalctl --disk-usage # how much space logs use
sudo journalctl --vacuum-size=500M # trim to 500MB
Advanced Level (36-50)
36. How do you troubleshoot high CPU usage on a Linux server?
top -bn1 | head -20 # identify top CPU consumers
pidstat -u 1 5 # per-process CPU over 5 seconds
perf top # real-time profiling (kernel + userspace)
strace -cp <PID> # what syscalls are burning CPU
cat /proc/<PID>/status | grep -i thread # thread count
37. What is the difference between iptables and nftables?
nftables is the modern replacement for iptables. It has a cleaner syntax, better performance, atomic rule updates, and combines ip/ip6/arp/bridge filtering into one framework. Most distributions are migrating to nftables.
38. Explain Linux namespaces and how containers use them.
Namespaces isolate system resources per process: PID (process IDs), NET (networking), MNT (mount points), UTS (hostname), IPC (inter-process communication), USER (user/group IDs). Containers are essentially processes with their own set of namespaces.
# List namespaces of a process
ls -la /proc/<PID>/ns/
# Create a new network namespace
sudo ip netns add test
sudo ip netns exec test ip addr
39. How do you find and fix a memory leak?
# Monitor process memory over time
while true; do ps -o pid,rss,vsz,comm -p <PID>; sleep 5; done
# Detailed memory map
pmap -x <PID>
cat /proc/<PID>/smaps_rollup
# Use valgrind (if you can restart the process)
valgrind --leak-check=full ./myapp
40. What is cgroup and how is it used?
cgroups (control groups) limit and account for resource usage (CPU, memory, I/O, network) for groups of processes. systemd uses cgroups for every service. Containers use cgroups for resource isolation.
# View cgroup hierarchy
systemd-cgls
# Check resource usage of a service
systemctl show nginx --property=MemoryCurrent
# Limit a service to 512MB RAM
sudo systemctl set-property nginx MemoryMax=512M
41. How do you handle a "too many open files" error?
# Check current limits
ulimit -n
cat /proc/<PID>/limits | grep "open files"
# Check how many files a process has open
ls /proc/<PID>/fd | wc -l
# Increase system-wide limit
echo "fs.file-max = 2097152" | sudo tee -a /etc/sysctl.conf
sudo sysctl -p
42. What is SELinux and how do you troubleshoot it?
SELinux enforces mandatory access control policies. When an application fails with "permission denied" despite correct file permissions, SELinux is often the cause.
getenforce # check current mode
sudo setenforce 0 # temporarily set permissive
ausearch -m avc -ts recent # check denial logs
sealert -a /var/log/audit/audit.log # human-readable analysis
43. How do you tune kernel parameters for a high-traffic web server?
sudo tee -a /etc/sysctl.conf << 'EOF'
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.core.netdev_max_backlog = 5000
net.ipv4.tcp_fin_timeout = 15
net.ipv4.tcp_tw_reuse = 1
vm.swappiness = 10
EOF
sudo sysctl -p
44. How do you set up LVM and extend a volume?
# Create physical volume, volume group, logical volume
sudo pvcreate /dev/sdb
sudo vgcreate data_vg /dev/sdb
sudo lvcreate -l 100%FREE -n data_lv data_vg
sudo mkfs.ext4 /dev/data_vg/data_lv
# Extend later with a new disk
sudo pvcreate /dev/sdc
sudo vgextend data_vg /dev/sdc
sudo lvextend -l +100%FREE /dev/data_vg/data_lv
sudo resize2fs /dev/data_vg/data_lv
45. Explain the /proc filesystem.
/proc is a virtual filesystem that exposes kernel and process information as files. Every running process has a /proc/<PID>/ directory. Key files: cmdline, status, fd/, maps, cgroup, environ. System info: /proc/cpuinfo, /proc/meminfo, /proc/loadavg.
46. How do you debug DNS resolution issues?
dig example.com # query DNS
dig @8.8.8.8 example.com # query specific DNS server
nslookup example.com # simple lookup
cat /etc/resolv.conf # check configured DNS servers
resolvectl status # systemd-resolved status
47. What is a D-state process and how do you handle it?
A D-state (uninterruptible sleep) process is waiting for I/O and cannot be killed -- not even with kill -9. It indicates a kernel-level I/O wait, often from a hung NFS mount or failing disk. Fix the underlying I/O issue (unmount NFS, replace disk) rather than trying to kill the process.
48. How do you perform a live kernel upgrade without rebooting?
Use kexec to load and boot into a new kernel without going through BIOS/bootloader:
sudo apt install kexec-tools
sudo kexec -l /boot/vmlinuz-new --initrd=/boot/initrd-new --reuse-cmdline
sudo kexec -e # instantly boots into new kernel
49. How do you create a custom systemd service?
sudo tee /etc/systemd/system/myapp.service << 'EOF'
[Unit]
Description=My Application
After=network.target
Wants=network-online.target
[Service]
Type=simple
User=appuser
WorkingDirectory=/opt/myapp
ExecStart=/opt/myapp/bin/start.sh
Restart=always
RestartSec=5
Environment=NODE_ENV=production
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now myapp
50. Describe your approach to troubleshooting a production Linux server that is unresponsive.
Systematic approach: (1) Can you SSH in? If not, use out-of-band access (console, IPMI). (2) Check load average with uptime. (3) Check memory with free -h and dmesg | grep oom. (4) Check disk with df -h and iostat -x 1. (5) Check CPU with top and mpstat. (6) Check network with ss -s and dmesg | grep -i link. (7) Check logs with journalctl -p err --since "1 hour ago". (8) Check for runaway processes with ps aux --sort=-%cpu | head.
Know the answers? Now you need the complete learning path. Next up: The Complete Linux Learning Roadmap from zero to production-ready SRE.
