Docker Images Explained — Layers, Caching, and Why Size Matters
You ran docker build and it finished in 0.2 seconds. Then you changed one line of code and suddenly it takes 4 minutes. What happened? The answer lies in how Docker images are built — layer by layer — and understanding this will save you hours of wasted build time.
What Are Image Layers?
A Docker image is not a single blob. It is a stack of read-only filesystem layers, each produced by a single instruction in your Dockerfile. When Docker executes RUN, COPY, or ADD, it creates a new layer containing only the changes from the previous layer.
# Pull an image and see its layers
docker pull nginx:alpine
docker history nginx:alpine
The output shows each layer, its size, and the command that created it. The key insight: layers are shared across images. If ten images use alpine:3.19 as their base, that base layer is stored only once on disk.
How Docker Build Caching Works
Docker caches each layer. On the next build, it checks — has this instruction changed? Has any file it depends on changed? If not, Docker reuses the cached layer and skips the work entirely.
Here is where most people go wrong:
# BAD: Cache busts on every code change
FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm install
CMD ["node", "server.js"]
Every time you change any file, the COPY . . layer is invalidated. That means npm install runs again from scratch even if your dependencies have not changed.
# GOOD: Dependencies cached separately
FROM node:20-alpine
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --only=production
COPY . .
CMD ["node", "server.js"]
Now npm ci only runs when package.json or package-lock.json change. Your code changes? Only the final COPY layer rebuilds.
The Golden Rule of Layer Ordering
Put instructions that change least frequently at the top and instructions that change most frequently at the bottom. This maximizes cache hits.
FROM python:3.12-slim
# 1. System deps (rarely change)
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc libpq-dev && rm -rf /var/lib/apt/lists/*
# 2. Python deps (change occasionally)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# 3. Application code (changes frequently)
COPY . .
CMD ["gunicorn", "app:app", "--bind", "0.0.0.0:8000"]
Inspecting Layers with docker history
You can inspect exactly what each layer contributes:
docker history --no-trunc --format "table {{.Size}}\t{{.CreatedBy}}" myapp:latest
This reveals which layers are bloated. A common culprit is leaving behind package manager caches or build artifacts.
Why Image Size Matters
A 1.2 GB image versus a 45 MB image — does it matter? Absolutely.
| Impact | Small Image | Large Image |
|---|---|---|
| Pull time (100 Mbps) | ~4 seconds | ~96 seconds |
| Storage per node (50 images) | ~2.2 GB | ~60 GB |
| Attack surface | Minimal | Hundreds of unused packages |
| Cold start (serverless) | Fast | Painful |
| CI/CD pipeline | Minutes | Much longer |
In Kubernetes, large images mean slower pod scheduling, slower autoscaling, and higher storage costs across every node in your cluster.
Choosing the Right Base Image
Not all base images are equal. Here is a real comparison:
# Pull and compare sizes
docker pull ubuntu:22.04
docker pull debian:bookworm-slim
docker pull alpine:3.19
docker pull gcr.io/distroless/static-debian12
docker images --format "table {{.Repository}}:{{.Tag}}\t{{.Size}}" | grep -E "ubuntu|debian|alpine|distroless"
| Base Image | Size | Package Manager | Use Case |
|---|---|---|---|
ubuntu:22.04 | ~77 MB | apt | When you need full Ubuntu compatibility |
debian:bookworm-slim | ~74 MB | apt | Smaller Debian without docs/man pages |
alpine:3.19 | ~7 MB | apk | Minimal Linux with musl libc |
distroless/static | ~2 MB | None | Go/Rust static binaries — no shell at all |
Alpine is popular but beware: it uses musl libc instead of glibc. Some Python packages with C extensions or Java applications may behave differently. Test before committing.
Using .dockerignore
Without a .dockerignore, COPY . . sends your entire project directory to the Docker daemon — including node_modules, .git, test files, and local env files.
# .dockerignore
.git
.gitignore
node_modules
npm-debug.log
Dockerfile
docker-compose.yml
.env
.env.*
tests/
coverage/
*.md
.vscode/
.idea/
# Check your build context size
docker build -t test . 2>&1 | head -5
# "Sending build context to Docker daemon 2.1MB" vs "847MB"
The difference can be massive. A bloated build context slows every build even before the first instruction executes.
Multi-Stage Builds Preview
Multi-stage builds are the ultimate weapon for small images. The idea: use one stage to build your application, then copy only the compiled output into a minimal final image.
# Stage 1: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Stage 2: Production
FROM nginx:alpine
COPY --from=builder /app/dist /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
Your final image contains only nginx and your static files. No Node.js, no node_modules, no source code. What was 400 MB becomes 25 MB.
# Build and verify the size
docker build -t myapp:multi-stage .
docker images myapp:multi-stage
Quick Reference: Layer Optimization Checklist
| Practice | Impact |
|---|---|
| Copy dependency files before code | Caches dependency install step |
Combine RUN commands with && | Fewer layers, smaller image |
| Clean up in the same RUN layer | Avoids storing deleted files in previous layers |
Use .dockerignore | Faster build context transfer |
| Choose minimal base images | Smaller starting point |
| Use multi-stage builds | Final image contains only runtime artifacts |
Wrapping Up
Docker images are not magic black boxes — they are layered filesystems with predictable caching behavior. Once you understand layers, you can structure your Dockerfiles to build in seconds instead of minutes, ship images that are megabytes instead of gigabytes, and reduce your attack surface in production.
In the next post, we will tackle Docker Volumes — how to persist data that survives container restarts and why understanding storage is critical for running databases and stateful workloads in containers.
