Skip to main content

Docker Images Explained — Layers, Caching, and Why Size Matters

· 5 min read
Goel Academy
DevOps & Cloud Learning Hub

You ran docker build and it finished in 0.2 seconds. Then you changed one line of code and suddenly it takes 4 minutes. What happened? The answer lies in how Docker images are built — layer by layer — and understanding this will save you hours of wasted build time.

What Are Image Layers?

A Docker image is not a single blob. It is a stack of read-only filesystem layers, each produced by a single instruction in your Dockerfile. When Docker executes RUN, COPY, or ADD, it creates a new layer containing only the changes from the previous layer.

# Pull an image and see its layers
docker pull nginx:alpine
docker history nginx:alpine

The output shows each layer, its size, and the command that created it. The key insight: layers are shared across images. If ten images use alpine:3.19 as their base, that base layer is stored only once on disk.

How Docker Build Caching Works

Docker caches each layer. On the next build, it checks — has this instruction changed? Has any file it depends on changed? If not, Docker reuses the cached layer and skips the work entirely.

Here is where most people go wrong:

# BAD: Cache busts on every code change
FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm install
CMD ["node", "server.js"]

Every time you change any file, the COPY . . layer is invalidated. That means npm install runs again from scratch even if your dependencies have not changed.

# GOOD: Dependencies cached separately
FROM node:20-alpine
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --only=production
COPY . .
CMD ["node", "server.js"]

Now npm ci only runs when package.json or package-lock.json change. Your code changes? Only the final COPY layer rebuilds.

The Golden Rule of Layer Ordering

Put instructions that change least frequently at the top and instructions that change most frequently at the bottom. This maximizes cache hits.

FROM python:3.12-slim
# 1. System deps (rarely change)
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc libpq-dev && rm -rf /var/lib/apt/lists/*
# 2. Python deps (change occasionally)
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# 3. Application code (changes frequently)
COPY . .
CMD ["gunicorn", "app:app", "--bind", "0.0.0.0:8000"]

Inspecting Layers with docker history

You can inspect exactly what each layer contributes:

docker history --no-trunc --format "table {{.Size}}\t{{.CreatedBy}}" myapp:latest

This reveals which layers are bloated. A common culprit is leaving behind package manager caches or build artifacts.

Why Image Size Matters

A 1.2 GB image versus a 45 MB image — does it matter? Absolutely.

ImpactSmall ImageLarge Image
Pull time (100 Mbps)~4 seconds~96 seconds
Storage per node (50 images)~2.2 GB~60 GB
Attack surfaceMinimalHundreds of unused packages
Cold start (serverless)FastPainful
CI/CD pipelineMinutesMuch longer

In Kubernetes, large images mean slower pod scheduling, slower autoscaling, and higher storage costs across every node in your cluster.

Choosing the Right Base Image

Not all base images are equal. Here is a real comparison:

# Pull and compare sizes
docker pull ubuntu:22.04
docker pull debian:bookworm-slim
docker pull alpine:3.19
docker pull gcr.io/distroless/static-debian12

docker images --format "table {{.Repository}}:{{.Tag}}\t{{.Size}}" | grep -E "ubuntu|debian|alpine|distroless"
Base ImageSizePackage ManagerUse Case
ubuntu:22.04~77 MBaptWhen you need full Ubuntu compatibility
debian:bookworm-slim~74 MBaptSmaller Debian without docs/man pages
alpine:3.19~7 MBapkMinimal Linux with musl libc
distroless/static~2 MBNoneGo/Rust static binaries — no shell at all

Alpine is popular but beware: it uses musl libc instead of glibc. Some Python packages with C extensions or Java applications may behave differently. Test before committing.

Using .dockerignore

Without a .dockerignore, COPY . . sends your entire project directory to the Docker daemon — including node_modules, .git, test files, and local env files.

# .dockerignore
.git
.gitignore
node_modules
npm-debug.log
Dockerfile
docker-compose.yml
.env
.env.*
tests/
coverage/
*.md
.vscode/
.idea/
# Check your build context size
docker build -t test . 2>&1 | head -5
# "Sending build context to Docker daemon 2.1MB" vs "847MB"

The difference can be massive. A bloated build context slows every build even before the first instruction executes.

Multi-Stage Builds Preview

Multi-stage builds are the ultimate weapon for small images. The idea: use one stage to build your application, then copy only the compiled output into a minimal final image.

# Stage 1: Build
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Stage 2: Production
FROM nginx:alpine
COPY --from=builder /app/dist /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

Your final image contains only nginx and your static files. No Node.js, no node_modules, no source code. What was 400 MB becomes 25 MB.

# Build and verify the size
docker build -t myapp:multi-stage .
docker images myapp:multi-stage

Quick Reference: Layer Optimization Checklist

PracticeImpact
Copy dependency files before codeCaches dependency install step
Combine RUN commands with &&Fewer layers, smaller image
Clean up in the same RUN layerAvoids storing deleted files in previous layers
Use .dockerignoreFaster build context transfer
Choose minimal base imagesSmaller starting point
Use multi-stage buildsFinal image contains only runtime artifacts

Wrapping Up

Docker images are not magic black boxes — they are layered filesystems with predictable caching behavior. Once you understand layers, you can structure your Dockerfiles to build in seconds instead of minutes, ship images that are megabytes instead of gigabytes, and reduce your attack surface in production.

In the next post, we will tackle Docker Volumes — how to persist data that survives container restarts and why understanding storage is critical for running databases and stateful workloads in containers.