Skip to main content

Docker Image Optimization — Distroless, Scratch, and Alpine Compared

· 9 min read
Goel Academy
DevOps & Cloud Learning Hub

Every megabyte in your Docker image is attack surface you do not need, bandwidth you pay for, and startup time your users wait through. A default Node.js image weighs over 1 GB. The same application built on a distroless base can drop to 120 MB. This post walks through every base image option, shows real size comparisons, and gives you language-specific recommendations for building the smallest, most secure images possible.

Why Image Size Matters

Smaller images are not just about saving disk space. They directly impact three things that matter in production.

Security — fewer packages means fewer CVEs. A full Ubuntu image ships with hundreds of binaries, most of which your application never calls. Each one is a potential vulnerability. Distroless images have zero shell, zero package managers, and zero unnecessary binaries.

Speed — image pull time dominates cold-start latency. On Kubernetes, when a pod gets scheduled to a new node, it must pull the image before starting. A 50 MB image pulls in seconds. A 1 GB image takes minutes on slower networks. This directly affects scaling speed and recovery time.

Cost — registry storage, network transfer, and node disk space all cost money. At scale, the difference between 100 MB and 1 GB images across thousands of deployments adds up fast.

Base Image Comparison

Here is what popular base images look like in terms of size and included packages.

Base ImageCompressed SizePackagesShellPackage ManagerCVEs (typical)
ubuntu:24.04~78 MB~100+Yesapt20-50
debian:bookworm~52 MB~80+Yesapt15-40
debian:bookworm-slim~27 MB~40+Yesapt5-15
alpine:3.20~3.5 MB~15Yesapk0-5
gcr.io/distroless/static~2 MB0NoNo0-1
gcr.io/distroless/base~20 MBglibc onlyNoNo0-3
scratch0 MB0NoNo0

The difference is dramatic. Scratch is literally nothing — zero bytes, zero files, zero filesystem. Distroless gives you just enough to run a compiled binary or a language runtime. Alpine gives you a minimal Linux with a shell and package manager.

Building From Scratch

Scratch is the empty image. It contains nothing at all. You can only use it with statically compiled binaries — Go and Rust are the prime candidates.

# Go application — build from scratch
FROM golang:1.23 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
# CGO_ENABLED=0 ensures static linking
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /server .

FROM scratch
# Copy CA certificates for HTTPS
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
# Copy timezone data if your app needs it
COPY --from=builder /usr/share/zoneinfo /usr/share/zoneinfo
COPY --from=builder /server /server
EXPOSE 8080
ENTRYPOINT ["/server"]
# Rust application — build from scratch
FROM rust:1.82 AS builder
WORKDIR /app
COPY Cargo.toml Cargo.lock ./
RUN mkdir src && echo "fn main() {}" > src/main.rs
RUN cargo build --release
COPY src ./src
RUN touch src/main.rs && cargo build --release
# Ensure static linking with musl
RUN apt-get update && apt-get install -y musl-tools
RUN rustup target add x86_64-unknown-linux-musl
RUN cargo build --release --target x86_64-unknown-linux-musl

FROM scratch
COPY --from=builder /app/target/x86_64-unknown-linux-musl/release/myapp /myapp
EXPOSE 8080
ENTRYPOINT ["/myapp"]

The key requirement for scratch: your binary must be statically linked. If it dynamically links to glibc, it will fail at runtime with a cryptic "not found" error.

Google Distroless Images

Distroless images are maintained by Google and provide the minimum needed for each language runtime — no shell, no package manager, no coreutils. If an attacker breaks into your container, they cannot even run ls or cat.

# Node.js with distroless
FROM node:20-slim AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --production
COPY . .

FROM gcr.io/distroless/nodejs20-debian12
WORKDIR /app
COPY --from=builder /app .
EXPOSE 3000
CMD ["server.js"]
# Java with distroless
FROM eclipse-temurin:21-jdk AS builder
WORKDIR /app
COPY . .
RUN ./gradlew build --no-daemon

FROM gcr.io/distroless/java21-debian12
COPY --from=builder /app/build/libs/app.jar /app.jar
EXPOSE 8080
CMD ["/app.jar"]

Available distroless images include static (for Go/Rust), base (glibc only), cc (libstdc++), nodejs20, java21, and python3. Each has a -debug variant that includes a busybox shell for troubleshooting.

Alpine Gotchas — musl vs glibc

Alpine uses musl libc instead of glibc. This saves space but creates compatibility issues that have caught many teams by surprise.

# Common symptoms of musl/glibc incompatibility:
# 1. DNS resolution behaves differently
# 2. Pre-compiled binaries may segfault
# 3. Python packages with C extensions may fail to install
# 4. Node.js native modules may need rebuilding

# Example: Python package that fails on Alpine
docker run -it python:3.12-alpine sh
# apk add --no-cache gcc musl-dev linux-headers
# pip install psutil # Requires compilation, takes 5x longer than on Debian
# Check if your binary is statically or dynamically linked
docker run --rm alpine:3.20 sh -c "
apk add --no-cache file binutils
# Check a binary
file /usr/bin/busybox
# busybox: ELF 64-bit, statically linked, stripped

ldd /usr/bin/busybox 2>&1
# Not a dynamic executable
"

# On glibc-based images:
docker run --rm debian:bookworm-slim sh -c "
ldd /bin/ls
# linux-vdso.so.1 => ...
# libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1
# libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6
"

When to use Alpine: Go applications (static binaries), simple shell scripts, tools you compile from source. When to avoid Alpine: Python with native extensions, Node.js with native modules, pre-compiled binaries expecting glibc.

Slim Variants and Their Place

Most official images provide -slim variants that strip development tools while keeping glibc.

# Size comparison for Node.js base images:
# node:20 → ~1.1 GB (full Debian + build tools)
# node:20-slim → ~200 MB (Debian slim, no build tools)
# node:20-alpine → ~130 MB (Alpine, musl libc)
# distroless/nodejs → ~120 MB (no shell, no pkg manager)

# Best practice: use slim for building, distroless for running
FROM node:20-slim AS builder
WORKDIR /app
COPY package.json package-lock.json ./
# Install build dependencies only in builder stage
RUN apt-get update && apt-get install -y --no-install-recommends \
python3 make g++ \
&& npm ci --production \
&& apt-get purge -y python3 make g++ \
&& rm -rf /var/lib/apt/lists/*
COPY . .

FROM gcr.io/distroless/nodejs20-debian12
WORKDIR /app
COPY --from=builder /app .
CMD ["server.js"]

Slim variants are the sweet spot for applications that need glibc compatibility but not a full OS. They cut 60-80% of the size compared to full images while avoiding musl headaches.

UPX Compression for Binaries

UPX (Ultimate Packer for eXecutables) compresses binaries by 50-70%, at the cost of slightly slower startup time due to decompression.

# Go binary with UPX compression
FROM golang:1.23 AS builder
WORKDIR /app
COPY . .
RUN CGO_ENABLED=0 go build -ldflags="-s -w" -o /server .
# Install and run UPX
RUN apt-get update && apt-get install -y upx
RUN upx --best --lzma /server
# Before UPX: ~15 MB
# After UPX: ~4 MB

FROM scratch
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
COPY --from=builder /server /server
ENTRYPOINT ["/server"]

Note: UPX can trigger false positives in some antivirus and container scanning tools. It also increases memory usage at startup since the binary must be decompressed into memory. For most production workloads, the -ldflags="-s -w" flags (strip debug info and symbol table) provide enough size reduction without UPX downsides.

Analyzing Layers With dive

The dive tool lets you inspect every layer of a Docker image to find wasted space.

# Install dive
# macOS
brew install dive

# Linux
wget https://github.com/wagoodman/dive/releases/latest/download/dive_linux_amd64.deb
sudo dpkg -i dive_linux_amd64.deb

# Analyze an image
dive myapp:latest

# CI mode — fail if image efficiency is below threshold
dive myapp:latest --ci --lowestEfficiency 0.95 --highestWastedBytes 50MB

# Example output:
# Image efficiency score: 97%
# Potentially wasted space: 12 MB
# Layer 1: 27 MB FROM debian:bookworm-slim
# Layer 2: 4 MB RUN apt-get update && apt-get install...
# Layer 3: 15 MB COPY --from=builder /app .

Common issues dive reveals: leftover package caches (/var/lib/apt/lists), temporary build files, duplicate files across layers, and unnecessary tools installed in the final image.

Real Before and After

Here is a practical example showing a Python API server optimized step by step.

# BEFORE: 1.2 GB
FROM python:3.12
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
CMD ["python", "app.py"]
# AFTER: 145 MB (88% reduction)
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --target=/deps -r requirements.txt

FROM gcr.io/distroless/python3-debian12
WORKDIR /app
COPY --from=builder /deps /deps
COPY . .
ENV PYTHONPATH=/deps
CMD ["app.py"]
Optimization StepImage SizeReduction
python:3.12 (full)1.2 GBbaseline
python:3.12-slim350 MB-71%
Multi-stage with slim200 MB-83%
Distroless final stage145 MB-88%

Choosing the Right Base for Your Language

LanguageRecommended BaseWhy
GoscratchStatic binaries, zero dependencies
Rustscratch or distroless/staticStatic with musl, or dynamic with glibc
Javadistroless/java21Needs JVM, no shell needed
Node.jsdistroless/nodejs20Needs V8 runtime, no shell needed
Pythonpython:3.12-slim or distroless/python3Needs interpreter, C extensions need glibc
.NETmcr.microsoft.com/dotnet/runtime-depsSmallest official .NET base
C/C++distroless/ccNeeds libstdc++ and glibc

The pattern is consistent: compile or build in a full image, copy only the output into the smallest image that can run it. For interpreted languages, distroless or slim variants are your best options. For compiled languages, scratch is the ultimate target.

Wrapping Up

Image optimization is not a one-time task — it is a design decision that affects security posture, deployment speed, and infrastructure costs for the lifetime of your application. Start with the right base image for your language, use multi-stage builds to separate build-time and runtime dependencies, and validate with dive in your CI pipeline. The goal is not to reach zero bytes — it is to include exactly what your application needs and nothing more. Every unnecessary binary is an exploit waiting to happen, and every wasted megabyte is money walking out the door.