Skip to main content

How Containers Actually Work — Namespaces, Cgroups, and chroot

· 7 min read
Goel Academy
DevOps & Cloud Learning Hub

Docker isn't magic — here's how to build a container with just Linux commands. Containers are nothing more than regular Linux processes with three layers of isolation: namespaces (what a process can see), cgroups (what a process can use), and a changed root filesystem (where a process lives). Once you understand these primitives, Kubernetes networking, Docker storage drivers, and container security all start making sense.

The Three Pillars of Containers

PrimitiveControlsQuestion It Answers
NamespacesVisibilityWhat can the process see?
CgroupsResourcesHow much CPU/memory can it use?
chroot / pivot_rootFilesystemWhat filesystem does it see?

Let's build each layer from the ground up.

Namespaces: Isolating What a Process Can See

Linux has 8 namespace types. Each one isolates a different aspect of the system.

NamespaceFlagIsolates
PIDCLONE_NEWPIDProcess IDs
NETCLONE_NEWNETNetwork stack
MNTCLONE_NEWNSMount points
UTSCLONE_NEWUTSHostname
IPCCLONE_NEWIPCInter-process communication
USERCLONE_NEWUSERUser/group IDs
CgroupCLONE_NEWCGROUPCgroup root
TimeCLONE_NEWTIMESystem clocks (kernel 5.6+)

The unshare command lets you create new namespaces from the command line.

PID Namespace: A New Process Tree

# Create a new PID namespace and run bash inside it
sudo unshare --pid --fork --mount-proc bash

# Inside the new namespace:
ps aux
# PID 1 is now your bash process — it can't see host processes

# Check from the host (in another terminal):
ps aux | grep unshare
# The namespaced process still has a real PID on the host

UTS Namespace: Custom Hostname

# Create a new UTS namespace with a custom hostname
sudo unshare --uts bash

# Change hostname — only affects this namespace
hostname my-container
hostname
# Output: my-container

# Check from host — hostname is unchanged

NET Namespace: Isolated Network Stack

This is the foundation of how Docker and Kubernetes networking works.

# Create a named network namespace
sudo ip netns add mycontainer

# List namespaces
ip netns list

# Run a command in the namespace
sudo ip netns exec mycontainer ip addr
# Only loopback, no external connectivity

# Create a veth pair (virtual ethernet cable)
sudo ip link add veth-host type veth peer name veth-container

# Move one end into the namespace
sudo ip link set veth-container netns mycontainer

# Configure IP addresses
sudo ip addr add 10.200.0.1/24 dev veth-host
sudo ip link set veth-host up

sudo ip netns exec mycontainer ip addr add 10.200.0.2/24 dev veth-container
sudo ip netns exec mycontainer ip link set veth-container up
sudo ip netns exec mycontainer ip link set lo up

# Test connectivity
sudo ip netns exec mycontainer ping -c 2 10.200.0.1

This is exactly what Docker does when it creates a container with bridge networking — veth pairs connecting the container namespace to a bridge on the host.

Cgroups: Limiting What a Process Can Use

Cgroups (control groups) enforce resource limits. Most modern distros use cgroups v2.

# Check if cgroups v2 is active
stat -fc %T /sys/fs/cgroup/
# Output: cgroup2fs (v2) or tmpfs (v1)

# View the cgroup hierarchy
ls /sys/fs/cgroup/

Creating a Cgroup and Setting Limits

# Create a new cgroup
sudo mkdir /sys/fs/cgroup/mycontainer

# Set memory limit to 256MB
echo $((256 * 1024 * 1024)) | sudo tee /sys/fs/cgroup/mycontainer/memory.max

# Set CPU limit to 50% of one core (50000 out of 100000 microseconds)
echo "50000 100000" | sudo tee /sys/fs/cgroup/mycontainer/cpu.max

# Set CPU weight (relative priority, default is 100)
echo 50 | sudo tee /sys/fs/cgroup/mycontainer/cpu.weight

# Set PID limit (max number of processes)
echo 64 | sudo tee /sys/fs/cgroup/mycontainer/pids.max

# Add the current shell to this cgroup
echo $$ | sudo tee /sys/fs/cgroup/mycontainer/cgroup.procs

Verifying Cgroup Limits

# Check current memory usage of the cgroup
cat /sys/fs/cgroup/mycontainer/memory.current

# Check if any OOM kills have occurred
cat /sys/fs/cgroup/mycontainer/memory.events

# Check CPU usage statistics
cat /sys/fs/cgroup/mycontainer/cpu.stat
Resource FileControlsExample Value
memory.maxHard memory limit268435456 (256MB)
memory.highThrottle threshold209715200 (200MB)
cpu.maxCPU bandwidth50000 100000 (50%)
cpu.weightRelative CPU shares50 (half of default)
pids.maxProcess count limit64
io.maxDisk I/O bandwidth8:0 rbps=1048576

chroot: Changing the Root Filesystem

The final piece — give the process its own filesystem view.

# Create a minimal root filesystem using Alpine
mkdir -p /tmp/mycontainer/rootfs
cd /tmp/mycontainer

# Download Alpine Linux mini root filesystem
curl -o alpine-rootfs.tar.gz https://dl-cdn.alpinelinux.org/alpine/v3.18/releases/x86_64/alpine-minirootfs-3.18.4-x86_64.tar.gz
tar xzf alpine-rootfs.tar.gz -C rootfs

# Enter the chroot
sudo chroot rootfs /bin/sh

# Inside the chroot:
cat /etc/os-release
# You're now running Alpine, isolated from the host filesystem

ls / # Only Alpine's files
whoami # root (within the chroot)

Building a Container from Scratch

Let's combine all three primitives into an actual container.

#!/bin/bash
# build-container.sh — A container in ~20 lines of bash

ROOTFS="/tmp/mycontainer/rootfs"
CGROUP="/sys/fs/cgroup/scratch-container"

# Ensure rootfs exists (download Alpine if not present)
if [ ! -d "$ROOTFS/bin" ]; then
mkdir -p "$ROOTFS"
curl -sL https://dl-cdn.alpinelinux.org/alpine/v3.18/releases/x86_64/alpine-minirootfs-3.18.4-x86_64.tar.gz | tar xz -C "$ROOTFS"
fi

# Create cgroup with limits
sudo mkdir -p "$CGROUP"
echo $((128 * 1024 * 1024)) | sudo tee "$CGROUP/memory.max" # 128MB RAM
echo "25000 100000" | sudo tee "$CGROUP/cpu.max" # 25% CPU
echo 32 | sudo tee "$CGROUP/pids.max" # 32 processes

# Launch the container with all namespaces
sudo unshare \
--pid --fork \
--mount --uts --ipc \
--mount-proc="$ROOTFS/proc" \
bash -c "
# Set hostname
hostname scratch-container

# Mount essential filesystems
mount -t sysfs sysfs $ROOTFS/sys
mount -t tmpfs tmpfs $ROOTFS/tmp

# Add this process to the cgroup
echo \$\$ > $CGROUP/cgroup.procs

# Pivot into the new root
exec chroot $ROOTFS /bin/sh
"

# Cleanup
sudo rmdir "$CGROUP" 2>/dev/null
# Run it
chmod +x build-container.sh
sudo ./build-container.sh

# Inside your container:
hostname # scratch-container
ps aux # Only your shell and ps — PID 1 is sh
cat /proc/cpuinfo # Can see host CPUs but can only use 25%
free -m # Shows host memory, but cgroup enforces 128MB

Overlay Filesystems: Copy-on-Write Layers

Docker images use overlay filesystems to stack read-only layers with a writable top layer. This is how images share common base layers efficiently.

# Create the layer structure
mkdir -p /tmp/overlay/{lower,upper,work,merged}

# Lower layer: read-only base (imagine this is your base image)
echo "from base image" > /tmp/overlay/lower/base-file.txt

# Mount the overlay
sudo mount -t overlay overlay \
-o lowerdir=/tmp/overlay/lower,upperdir=/tmp/overlay/upper,workdir=/tmp/overlay/work \
/tmp/overlay/merged

# The merged view has the base file
cat /tmp/overlay/merged/base-file.txt
# Output: from base image

# Write a new file — goes to the upper layer only
echo "container data" > /tmp/overlay/merged/new-file.txt

# Verify: upper layer has the new file, lower is untouched
ls /tmp/overlay/upper/ # new-file.txt
ls /tmp/overlay/lower/ # base-file.txt (unchanged)

# Cleanup
sudo umount /tmp/overlay/merged

This is exactly how docker commit works — the upper layer becomes a new image layer.

What Docker Actually Does

Now you can map every Docker concept to Linux primitives:

Docker ConceptLinux Primitive
docker run --memory 256mcgroups memory.max
docker run --cpus 0.5cgroups cpu.max
docker run --hostname fooUTS namespace
docker run --network bridgeNET namespace + veth pair
docker run --pid hostShare host PID namespace
Image layersOverlayFS lower directories
Container writable layerOverlayFS upper directory
docker execnsenter into existing namespaces
# See the namespaces of a running Docker container
docker inspect --format '{{.State.Pid}}' <container_id>
ls -la /proc/<pid>/ns/

# Enter a container's namespaces directly (what 'docker exec' does)
sudo nsenter -t <pid> -m -u -i -n -p bash

Cleanup

# Remove the network namespace
sudo ip netns del mycontainer

# Remove cgroup
sudo rmdir /sys/fs/cgroup/mycontainer

# Remove rootfs
sudo rm -rf /tmp/mycontainer /tmp/overlay

Now that you understand the kernel primitives underneath containers, next we'll shift focus to securing the Linux host itself — a 20-step hardening checklist that every server exposed to the internet needs.