Container Security Is Important. Most Images Are Just Written Carelessly.

Container security has a reputation for being complex. In practice, most of the serious problems in production container environments come not from sophisticated attacks or exotic vulnerabilities but from straightforward bad habits baked into Dockerfiles that nobody has had time to revisit. This article walks through what those habits look like, what they cost, and how to fix them systematically.

The Anatomy of a Vulnerable Image

Start with a Dockerfile that looks, at first glance, reasonable:

FROM ubuntu:latest

RUN apt-get update && apt-get install -y \
    curl \
    wget \
    git \
    build-essential \
    python3 \
    nodejs \
    npm

COPY . .

RUN npm install

EXPOSE 3000
CMD ["node", "server.js"]

This is not a contrived example. Variations of this pattern appear constantly in codebases that have been running in production for years. Let's go through what is wrong with it.

Unpinned base image

FROM ubuntu:latest means the image you build today is not necessarily the image you build next month. latest resolves to whatever is current at build time. This breaks reproducibility and makes it genuinely difficult to reason about what is running in production. If a vulnerability is introduced into ubuntu:latest between your CI run and your production deploy, you may not catch it until a scan runs post-deployment, if ever.

Pin your base image to a specific digest:

FROM ubuntu:22.04@sha256:b492494d8e0113c4ad3fe4528a4b5ff89faa5331f7d52c5c138196f69ce176a6

Or at minimum a specific tag:

FROM ubuntu:22.04

The digest is stronger. Tags are mutable. Someone can push a new image to ubuntu:22.04 and your build will silently pick it up.

Build tools in the production image

build-essential, git, curl, wget — none of these belong in a running production container. They exist to compile code and fetch dependencies. Once the build is done, they are attack surface. An attacker with code execution in your container can use curl to exfiltrate data, wget to pull a second-stage payload, git to clone tooling. A container without these tools is not invulnerable, but it is considerably harder to operate inside.

The solution is multi-stage builds:

# Build stage — use whatever tools you need
FROM node:20.11-alpine3.19 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Runtime stage — nothing that was not explicitly copied in
FROM node:20.11-alpine3.19 AS runtime
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
EXPOSE 3000
CMD ["node", "dist/server.js"]

The runtime stage receives only the compiled output and production dependencies. The compiler, the test runner, the build scripts — none of it makes it into the image that runs in production.

Unpinned dependencies

npm install without a lockfile or with --no-package-lock allows dependency versions to float. The same applies to pip install requests without a pinned version, or apt-get install -y python3 without a specific version constraint. The result is that two builds from the same Dockerfile can produce images with different software, different behavior, and different vulnerability profiles.

Use npm ci instead of npm install. It requires a lockfile and treats it as authoritative. For Python, use pip install -r requirements.txt where requirements.txt contains pinned versions, ideally generated by pip-compile from a requirements.in source of truth.

Running as root

# No USER directive = runs as root
CMD ["node", "server.js"]

Most containers run as root by default. If your application process is compromised, the attacker has root inside the container. Depending on your runtime configuration, this may translate into capabilities on the host. Even without host escape, root inside the container can read any file, write to any path, and manipulate the process table.

Add a non-root user:

RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

Or on Debian-based images:

RUN groupadd --system appgroup && useradd --system --gid appgroup appuser
USER appuser

Your application should not need root at runtime. If it does, that is a separate problem worth solving.

Arbitrary shell commands and secrets in layers

RUN curl https://example.com/setup.sh | bash
RUN echo $API_KEY >> /app/config

curl | bash is an obvious problem. You are executing arbitrary remote code at build time with no verification. If that URL is ever compromised or the response changes, your build is compromised.

Secrets passed via RUN or ENV at build time are baked into the image layer and are visible in docker history. Even if you RUN unset API_KEY in a subsequent layer, the value is recoverable from the image manifest. Use Docker BuildKit secrets for build-time credentials:

# syntax=docker/dockerfile:1
RUN --mount=type=secret,id=api_key \
    API_KEY=$(cat /run/secrets/api_key) && ./configure.sh

docker build --secret id=api_key,src=./api_key.txt .

The secret is available during that RUN step and does not persist in any layer.

What This Actually Costs

The size and vulnerability numbers for common base images are not abstract.

A node:latest image (Debian-based) is currently around 360MB compressed. Running trivy image node:latest on a recent pull returns somewhere between 180 and 250 vulnerabilities depending on the day, including a reliable handful of HIGH and CRITICAL severity findings in base system libraries.

A node:20-alpine image comes in at around 55MB compressed. The same scan returns significantly fewer findings because Alpine uses musl libc instead of glibc and ships a much smaller set of packages.

A distroless Node image from Google (gcr.io/distroless/nodejs20-debian12) is around 100MB compressed but contains no package manager, no shell, no curl, no wget — almost nothing that does not directly support running a Node process. The attack surface is dramatically smaller.

The delta between a careless ubuntu:latest base with build tools included and a properly constructed distroless image can be 400MB of image size and an order of magnitude difference in vulnerability count. That is not a theoretical security improvement. It is a concrete reduction in the number of things an attacker can use.

The Consequences of Ignoring This

A container with a shell, curl, and a high-severity CVE in an included library is a much more useful foothold than a distroless container running a patched binary. The realistic attack chain looks like this:

A vulnerability in your application — an RCE in a dependency, a deserialization flaw, an unvalidated input — gives an attacker code execution in your container. If the container has a shell and network utilities, they can enumerate the environment, extract credentials from environment variables or mounted secrets, reach internal services on the cluster network, and pull additional tooling. If the container is running as root, the radius expands further.

None of this requires a zero-day. It requires a known CVE in an unpatched library and a container environment that makes the attacker's job easy.

Building Minimal Images

Multi-stage builds are the foundation, but the choice of base image for the runtime stage matters significantly.

Alpine is a reasonable default for many use cases. Small, well-maintained, and widely understood:

FROM node:20.11-alpine3.19 AS runtime

Distroless images go further. They contain only the runtime and its dependencies — no shell, no package manager, no utilities:

FROM gcr.io/distroless/nodejs20-debian12
COPY --from=builder /app/dist /app/dist
COPY --from=builder /app/node_modules /app/node_modules
WORKDIR /app
CMD ["dist/server.js"]

Note that distroless images have no shell, which means docker exec for debugging does not work the way you expect. For debugging, Google ships :debug variants that include busybox. Use them in development; use the standard image in production.

Scratch is appropriate for statically compiled binaries, typically Go or Rust:

FROM golang:1.22-alpine3.19 AS builder
WORKDIR /app
COPY go.* ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-w -s" -o server .

FROM scratch
COPY --from=builder /app/server /server
EXPOSE 8080
ENTRYPOINT ["/server"]

The resulting image contains exactly one file. The compressed size is typically under 10MB for a modest Go service. There is nothing to exploit beyond the application binary itself.

Nix for Reproducible Containers

Multi-stage Docker builds improve on naive Dockerfiles significantly, but they still depend on package registries being available, package versions being consistent, and the base image behavior being stable. Nix takes a different approach.

In Nix, every package is identified by a cryptographic hash of its inputs — its source, its build dependencies, and its build instructions. Two Nix builds from the same expression, on any machine, at any time, produce identical output. This is the property you want for a container build.

pkgs.dockerTools.buildLayeredImage builds a Docker image entirely within the Nix build system:

# container.nix
{ pkgs ? import (fetchTarball {
    url = "https://github.com/NixOS/nixpkgs/archive/nixos-23.11.tar.gz";
    sha256 = "1ndiv385w1qyb3b18vw13991fzb9wg4cl21wglk89grsfsnra41k";
  }) {}
}:

pkgs.dockerTools.buildLayeredImage {
  name = "my-node-app";
  tag = "latest";

  contents = [
    pkgs.nodejs_20
    pkgs.nodePackages.npm
  ];

  config = {
    Cmd = [ "${pkgs.nodejs_20}/bin/node" "dist/server.js" ];
    WorkingDir = "/app";
    User = "nobody:nobody";
    ExposedPorts = { "3000/tcp" = {}; };
  };
}

Build it with:

nix-build container.nix
docker load < result

The resulting image has no package manager inside it, no shell unless you explicitly included one, and is fully reproducible. The same expression built six months from now, on a different machine, produces a bit-for-bit identical image — provided you pin nixpkgs, which the example above does via a specific commit hash and sha256.

For more complex applications with their own Nix derivations, mkDerivation handles the build:

let
  app = pkgs.stdenv.mkDerivation {
    name = "my-app";
    src = ./.;
    buildInputs = [ pkgs.nodejs_20 ];
    buildPhase = "npm ci && npm run build";
    installPhase = "cp -r dist $out";
  };
in
pkgs.dockerTools.buildLayeredImage {
  name = "my-app";
  contents = [ pkgs.nodejs_20 app ];
  config.Cmd = [ "node" "${app}/server.js" ];
}

Nix has a steep learning curve. It is not the right tool for every team. But if you need hard guarantees about reproducibility and minimal runtime content, it is the only approach that actually delivers them.

Scanning with Grype and Trivy

Minimal images reduce your vulnerability surface. Scanning tells you what is still there.

Trivy from Aqua Security scans images, filesystems, Git repositories, and more. It checks OS packages, language-specific packages, and misconfigurations:

# Scan an image
trivy image node:20-alpine

# Exit with non-zero if HIGH or CRITICAL vulns are found — useful in CI
trivy image --exit-code 1 --severity HIGH,CRITICAL node:20-alpine

# Output as JSON for downstream processing
trivy image --format json --output results.json myapp:latest

# Scan a local filesystem
trivy fs ./

# Scan a Dockerfile for misconfigurations
trivy config ./Dockerfile

Grype from Anchore takes a similar approach and is particularly strong at language ecosystem coverage:

# Install
curl -sSfL https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh -s -- -b /usr/local/bin

# Scan an image
grype myapp:latest

# Only fail on HIGH and CRITICAL
grype myapp:latest --fail-on high

# Scan from an SBOM (faster, and works offline)
grype sbom:./sbom.spdx.json

In CI, both tools can act as a gate. A reasonable GitHub Actions step:

- name: Scan image for vulnerabilities
  run: |
    trivy image \
      --exit-code 1 \
      --severity CRITICAL \
      --ignore-unfixed \
      myapp:${{ github.sha }}

--ignore-unfixed is worth noting. Many CVE scanners will flag vulnerabilities for which no fix currently exists. Failing a build for an unfixed vulnerability in a transitive dependency you cannot yet remove is counterproductive. Filter for fixable findings and address those first.

SBOM Generation and Analysis

A Software Bill of Materials is a structured inventory of everything in your container: every OS package, every language dependency, every shared library, with versions and license information. It is the foundation for any serious supply chain security posture.

Syft from Anchore generates SBOMs from images, directories, and archives:

# Install
curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin

# Generate an SPDX JSON SBOM
syft myapp:latest -o spdx-json > sbom.spdx.json

# CycloneDX format, also widely supported
syft myapp:latest -o cyclonedx-json > sbom.cdx.json

# Scan the SBOM with Grype
grype sbom:./sbom.spdx.json

Separating SBOM generation from scanning means you can store the SBOM as an artifact and rescan it later against updated vulnerability databases without rebuilding the image. A vulnerability published six months after your last build is still detectable:

# Six months later, rescan the stored SBOM
grype sbom:./sbom.spdx.json --fail-on high

Trivy can also generate and consume SBOMs:

# Generate
trivy image --format spdx-json --output sbom.spdx.json myapp:latest

# Scan from SBOM
trivy sbom ./sbom.spdx.json

For regulated environments or mature supply chain requirements, attach the SBOM to the image in the registry using cosign:

cosign attach sbom --sbom sbom.spdx.json myapp:latest

This makes the SBOM discoverable by any tooling that understands OCI artifacts and ties it to the specific image digest it describes.

A Practical Baseline

If you are starting from a messy Dockerfile and want a defensible baseline quickly, the sequence is:

Pin the base image to a specific tag or digest
Split the build into stages; keep only runtime artifacts in the final stage
Add a non-root user and set USER before CMD
Run trivy image or grype on the output and address CRITICAL findings
Generate an SBOM with syft and store it alongside the image
Add the scan as a CI gate with --fail-on critical

This does not require Nix, distroless images, or any fundamental rearchitecting. It requires revisiting a Dockerfile with some discipline and twenty minutes of tooling setup. The result is an image that is smaller, easier to reason about, and significantly harder to use as a foothold.

The more sophisticated approaches — distroless runtimes, Nix-built images, continuous SBOM rescanning — are worth pursuing as practices mature. But most teams running containers in production today would benefit more from getting the basics right than from investing in advanced tooling on top of a leaky foundation.