How Containers Actually Work — Namespaces, Cgroups, and chroot

September 6, 2025 · 7 min read

DevOps & Cloud Learning Hub

Docker isn't magic — here's how to build a container with just Linux commands. Containers are nothing more than regular Linux processes with three layers of isolation: namespaces (what a process can see), cgroups (what a process can use), and a changed root filesystem (where a process lives). Once you understand these primitives, Kubernetes networking, Docker storage drivers, and container security all start making sense.

The Three Pillars of Containers

Primitive	Controls	Question It Answers
Namespaces	Visibility	What can the process see?
Cgroups	Resources	How much CPU/memory can it use?
chroot / pivot_root	Filesystem	What filesystem does it see?

Let's build each layer from the ground up.

Namespaces: Isolating What a Process Can See

Linux has 8 namespace types. Each one isolates a different aspect of the system.

Namespace	Flag	Isolates
PID	`CLONE_NEWPID`	Process IDs
NET	`CLONE_NEWNET`	Network stack
MNT	`CLONE_NEWNS`	Mount points
UTS	`CLONE_NEWUTS`	Hostname
IPC	`CLONE_NEWIPC`	Inter-process communication
USER	`CLONE_NEWUSER`	User/group IDs
Cgroup	`CLONE_NEWCGROUP`	Cgroup root
Time	`CLONE_NEWTIME`	System clocks (kernel 5.6+)

The unshare command lets you create new namespaces from the command line.

PID Namespace: A New Process Tree

# Create a new PID namespace and run bash inside it
sudo unshare --pid --fork --mount-proc bash

# Inside the new namespace:
ps aux
# PID 1 is now your bash process — it can't see host processes

# Check from the host (in another terminal):
ps aux | grep unshare
# The namespaced process still has a real PID on the host

UTS Namespace: Custom Hostname

# Create a new UTS namespace with a custom hostname
sudo unshare --uts bash

# Change hostname — only affects this namespace
hostname my-container
hostname
# Output: my-container

# Check from host — hostname is unchanged

NET Namespace: Isolated Network Stack

This is the foundation of how Docker and Kubernetes networking works.

# Create a named network namespace
sudo ip netns add mycontainer

# List namespaces
ip netns list

# Run a command in the namespace
sudo ip netns exec mycontainer ip addr
# Only loopback, no external connectivity

# Create a veth pair (virtual ethernet cable)
sudo ip link add veth-host type veth peer name veth-container

# Move one end into the namespace
sudo ip link set veth-container netns mycontainer

# Configure IP addresses
sudo ip addr add 10.200.0.1/24 dev veth-host
sudo ip link set veth-host up

sudo ip netns exec mycontainer ip addr add 10.200.0.2/24 dev veth-container
sudo ip netns exec mycontainer ip link set veth-container up
sudo ip netns exec mycontainer ip link set lo up

# Test connectivity
sudo ip netns exec mycontainer ping -c 2 10.200.0.1

This is exactly what Docker does when it creates a container with bridge networking — veth pairs connecting the container namespace to a bridge on the host.

Cgroups: Limiting What a Process Can Use

Cgroups (control groups) enforce resource limits. Most modern distros use cgroups v2.

# Check if cgroups v2 is active
stat -fc %T /sys/fs/cgroup/
# Output: cgroup2fs (v2) or tmpfs (v1)

# View the cgroup hierarchy
ls /sys/fs/cgroup/

Creating a Cgroup and Setting Limits

# Create a new cgroup
sudo mkdir /sys/fs/cgroup/mycontainer

# Set memory limit to 256MB
echo $((256 * 1024 * 1024)) | sudo tee /sys/fs/cgroup/mycontainer/memory.max

# Set CPU limit to 50% of one core (50000 out of 100000 microseconds)
echo "50000 100000" | sudo tee /sys/fs/cgroup/mycontainer/cpu.max

# Set CPU weight (relative priority, default is 100)
echo 50 | sudo tee /sys/fs/cgroup/mycontainer/cpu.weight

# Set PID limit (max number of processes)
echo 64 | sudo tee /sys/fs/cgroup/mycontainer/pids.max

# Add the current shell to this cgroup
echo $$ | sudo tee /sys/fs/cgroup/mycontainer/cgroup.procs

Verifying Cgroup Limits

# Check current memory usage of the cgroup
cat /sys/fs/cgroup/mycontainer/memory.current

# Check if any OOM kills have occurred
cat /sys/fs/cgroup/mycontainer/memory.events

# Check CPU usage statistics
cat /sys/fs/cgroup/mycontainer/cpu.stat

Resource File	Controls	Example Value
`memory.max`	Hard memory limit	`268435456` (256MB)
`memory.high`	Throttle threshold	`209715200` (200MB)
`cpu.max`	CPU bandwidth	`50000 100000` (50%)
`cpu.weight`	Relative CPU shares	`50` (half of default)
`pids.max`	Process count limit	`64`
`io.max`	Disk I/O bandwidth	`8:0 rbps=1048576`

chroot: Changing the Root Filesystem

The final piece — give the process its own filesystem view.

# Create a minimal root filesystem using Alpine
mkdir -p /tmp/mycontainer/rootfs
cd /tmp/mycontainer

# Download Alpine Linux mini root filesystem
curl -o alpine-rootfs.tar.gz https://dl-cdn.alpinelinux.org/alpine/v3.18/releases/x86_64/alpine-minirootfs-3.18.4-x86_64.tar.gz
tar xzf alpine-rootfs.tar.gz -C rootfs

# Enter the chroot
sudo chroot rootfs /bin/sh

# Inside the chroot:
cat /etc/os-release
# You're now running Alpine, isolated from the host filesystem

ls /        # Only Alpine's files
whoami      # root (within the chroot)

Building a Container from Scratch

Let's combine all three primitives into an actual container.

#!/bin/bash
# build-container.sh — A container in ~20 lines of bash

ROOTFS="/tmp/mycontainer/rootfs"
CGROUP="/sys/fs/cgroup/scratch-container"

# Ensure rootfs exists (download Alpine if not present)
if [ ! -d "$ROOTFS/bin" ]; then
  mkdir -p "$ROOTFS"
  curl -sL https://dl-cdn.alpinelinux.org/alpine/v3.18/releases/x86_64/alpine-minirootfs-3.18.4-x86_64.tar.gz | tar xz -C "$ROOTFS"
fi

# Create cgroup with limits
sudo mkdir -p "$CGROUP"
echo $((128 * 1024 * 1024)) | sudo tee "$CGROUP/memory.max"   # 128MB RAM
echo "25000 100000" | sudo tee "$CGROUP/cpu.max"                # 25% CPU
echo 32 | sudo tee "$CGROUP/pids.max"                           # 32 processes

# Launch the container with all namespaces
sudo unshare \
  --pid --fork \
  --mount --uts --ipc \
  --mount-proc="$ROOTFS/proc" \
  bash -c "
    # Set hostname
    hostname scratch-container

    # Mount essential filesystems
    mount -t sysfs sysfs $ROOTFS/sys
    mount -t tmpfs tmpfs $ROOTFS/tmp

    # Add this process to the cgroup
    echo \$\$ > $CGROUP/cgroup.procs

    # Pivot into the new root
    exec chroot $ROOTFS /bin/sh
  "

# Cleanup
sudo rmdir "$CGROUP" 2>/dev/null

# Run it
chmod +x build-container.sh
sudo ./build-container.sh

# Inside your container:
hostname          # scratch-container
ps aux            # Only your shell and ps — PID 1 is sh
cat /proc/cpuinfo # Can see host CPUs but can only use 25%
free -m           # Shows host memory, but cgroup enforces 128MB

Overlay Filesystems: Copy-on-Write Layers

Docker images use overlay filesystems to stack read-only layers with a writable top layer. This is how images share common base layers efficiently.

# Create the layer structure
mkdir -p /tmp/overlay/{lower,upper,work,merged}

# Lower layer: read-only base (imagine this is your base image)
echo "from base image" > /tmp/overlay/lower/base-file.txt

# Mount the overlay
sudo mount -t overlay overlay \
  -o lowerdir=/tmp/overlay/lower,upperdir=/tmp/overlay/upper,workdir=/tmp/overlay/work \
  /tmp/overlay/merged

# The merged view has the base file
cat /tmp/overlay/merged/base-file.txt
# Output: from base image

# Write a new file — goes to the upper layer only
echo "container data" > /tmp/overlay/merged/new-file.txt

# Verify: upper layer has the new file, lower is untouched
ls /tmp/overlay/upper/    # new-file.txt
ls /tmp/overlay/lower/    # base-file.txt (unchanged)

# Cleanup
sudo umount /tmp/overlay/merged

This is exactly how docker commit works — the upper layer becomes a new image layer.

What Docker Actually Does

Now you can map every Docker concept to Linux primitives:

Docker Concept	Linux Primitive
`docker run --memory 256m`	cgroups `memory.max`
`docker run --cpus 0.5`	cgroups `cpu.max`
`docker run --hostname foo`	UTS namespace
`docker run --network bridge`	NET namespace + veth pair
`docker run --pid host`	Share host PID namespace
Image layers	OverlayFS lower directories
Container writable layer	OverlayFS upper directory
`docker exec`	`nsenter` into existing namespaces

# See the namespaces of a running Docker container
docker inspect --format '{{.State.Pid}}' <container_id>
ls -la /proc/<pid>/ns/

# Enter a container's namespaces directly (what 'docker exec' does)
sudo nsenter -t <pid> -m -u -i -n -p bash

Cleanup

# Remove the network namespace
sudo ip netns del mycontainer

# Remove cgroup
sudo rmdir /sys/fs/cgroup/mycontainer

# Remove rootfs
sudo rm -rf /tmp/mycontainer /tmp/overlay

Now that you understand the kernel primitives underneath containers, next we'll shift focus to securing the Linux host itself — a 20-step hardening checklist that every server exposed to the internet needs.

The Three Pillars of Containers​

Namespaces: Isolating What a Process Can See​

PID Namespace: A New Process Tree​

UTS Namespace: Custom Hostname​

NET Namespace: Isolated Network Stack​

Cgroups: Limiting What a Process Can Use​

Creating a Cgroup and Setting Limits​

Verifying Cgroup Limits​

chroot: Changing the Root Filesystem​

Building a Container from Scratch​

Overlay Filesystems: Copy-on-Write Layers​

What Docker Actually Does​

Cleanup​

Stay Updated