A Production Node.js Dockerfile That Is Not 1.2GB

I inherited a Next.js service last quarter whose image was 1.24GB. Every deploy pulled that blob onto fresh nodes, cold starts crawled, and the registry bill had a line item I could see from space. The Dockerfile was four lines: FROM node, COPY . ., RUN npm install, CMD npm start. It worked. It also shipped the entire build toolchain, every dev dependency, the .git directory, and a root process that ignored SIGTERM so Kubernetes had to SIGKILL it on every rollout.

Twenty minutes of work took that image to 180MB and cut the rebuild time on a one-line source change from 90 seconds to under 4. None of it was clever. It was just the boring set of things a production Node image needs and almost none of them have. Here is the whole thing, layer by layer, accurate for Node 24 LTS in 2026.

Start with the base image, because it sets the ceiling

The base image is the single biggest lever you have, and the default node tag is the worst common choice. It is Debian with a full build environment baked in — Python, make, gcc, the works — so you can compile native addons. You almost never need that at runtime.

Here is what the realistic options cost you:

Base image	Compressed size	libc	Shell / package manager	Use when
`node:24`	~380MB	glibc	yes (apt)	almost never
`node:24-slim`	~75MB	glibc	yes (apt)	default choice
`node:24-alpine`	~55MB	musl	yes (apk)	size-critical, no glibc deps
`gcr.io/distroless/nodejs24`	~70MB	glibc	none	hardened runtime

My default is node:24-slim. It is glibc, so native modules that ship glibc prebuilt binaries (sharp, better-sqlite3, @node-rs/*) just work. Alpine uses musl, which means those modules either fall back to a slow path or fail to find a prebuilt binary and try to compile from source — and now you are debugging apk add for build tools you were trying to avoid. I have lost afternoons to musl-vs-glibc segfaults in image processing libraries. Alpine's 20MB saving is not worth that risk on most services.

Distroless is the security play: no shell, no package manager, no busybox, so the attack surface after a container breakout is tiny. The cost is that you cannot docker exec into it to debug, and you must use a debug variant for that. I reach for distroless on internet-facing services where the threat model justifies the operational friction. For most internal services, slim is the right tradeoff.

Multi-stage build: keep the toolchain out of the runtime

The core trick is that the things you need to build the app — dev dependencies, TypeScript, the bundler — are not the things you need to run it. A multi-stage build lets you do the heavy work in throwaway stages and copy only the artifacts into a clean final image. Everything in earlier stages is discarded.

Three stages: deps installs dependencies, build compiles, runner is the lean final image.

# syntax=docker/dockerfile:1.7
 
# ---- deps: install all dependencies (incl. dev) ----
FROM node:24-slim AS deps
WORKDIR /app
# Copy only manifests so this layer caches independently of source.
COPY package.json package-lock.json ./
# BuildKit cache mount keeps the npm cache warm across builds.
RUN --mount=type=cache,target=/root/.npm \
    npm ci
 
# ---- build: compile the app ----
FROM node:24-slim AS build
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
ENV NEXT_TELEMETRY_DISABLED=1
RUN npm run build
# Drop dev dependencies for a clean production node_modules.
RUN --mount=type=cache,target=/root/.npm \
    npm prune --omit=dev
 
# ---- runner: minimal runtime ----
FROM node:24-slim AS runner
WORKDIR /app
ENV NODE_ENV=production
ENV PORT=3000
 
# tini for correct PID 1 signal handling (see below).
RUN apt-get update && apt-get install -y --no-install-recommends tini \
    && rm -rf /var/lib/apt/lists/*
 
# Run as an unprivileged user. node:24-slim ships a `node` user (uid 1000).
USER node
 
COPY --chown=node:node --from=build /app/node_modules ./node_modules
COPY --chown=node:node --from=build /app/.next ./.next
COPY --chown=node:node --from=build /app/public ./public
COPY --chown=node:node --from=build /app/package.json ./package.json
 
EXPOSE 3000
 
HEALTHCHECK --interval=30s --timeout=3s --start-period=20s --retries=3 \
    CMD node -e "fetch('http://127.0.0.1:'+process.env.PORT+'/api/health').then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))"
 
ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["npm", "start"]

That is a complete, working image. Now the parts that matter.

Layer caching: copy manifests before source

This is the change that took my rebuilds from 90 seconds to 4. Docker caches each layer and reuses it if the inputs have not changed. If your first instruction is COPY . ., then any source change — a typo fix in a component — invalidates that layer and every layer after it, including the expensive npm ci.

So copy the manifests first, install, and only then copy the rest:

COPY package.json package-lock.json ./
RUN npm ci          # cached until your dependencies change
COPY . .            # changes on every commit, but cheap

Now npm ci only re-runs when package-lock.json changes. Your daily edits hit the cache. The --mount=type=cache,target=/root/.npm BuildKit cache mount goes further: even when the lockfile does change, npm pulls unchanged packages from a persistent local cache instead of the network. On a CI runner with a warm cache, a dependency bump that used to cost a full cold install drops to seconds.

Use npm ci, not npm install. ci installs exactly what the lockfile pins, fails if package.json and the lockfile disagree, and deletes node_modules first for a reproducible result. install mutates your lockfile, which is the last thing you want in a build.

Run as a non-root user

By default everything in a container runs as root. If an attacker finds an RCE in your app, they get root inside the container — and root in a container is a meaningfully better starting position for escaping to the host than an unprivileged user. There is no reason your Node process needs to be root.

node:24-slim already ships a node user with uid 1000. Switch to it with USER node, and crucially, use COPY --chown=node:node so the copied files are owned by that user rather than root. A common gotcha: if you write logs or a cache to a directory the node user cannot write to, the app crashes on boot. Create and chown those directories explicitly before dropping privileges, or write only to /tmp.

Signal handling: tini, because Node as PID 1 is a trap

When your container's main process is PID 1, the kernel treats it specially: signals that have no explicit handler are not delivered with their default action. Node does not install a SIGTERM handler by default, so as PID 1 it ignores SIGTERM entirely. Kubernetes sends SIGTERM to ask for a graceful shutdown, waits out the terminationGracePeriod (30s by default), then SIGKILLs you. Every single rollout pays a 30-second tax, and in-flight requests get dropped.

tini is a tiny init that runs as PID 1, forwards signals correctly to your process, and reaps zombie children. Set it as the ENTRYPOINT and your SIGTERM reaches Node:

ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["npm", "start"]

dumb-init does the same job; pick either. Even better, also handle SIGTERM in your app to drain connections before exiting — tini delivers the signal, but your code decides what graceful means:

const server = app.listen(process.env.PORT);
 
process.on("SIGTERM", () => {
  server.close(() => process.exit(0));
});

A tight .dockerignore

COPY . . copies whatever is in your build context. Without a .dockerignore, that includes node_modules from your laptop (wrong platform, possibly), .git (can be hundreds of MB), .env files (a secret leak straight into an image layer), and build output. A good .dockerignore shrinks the context, speeds up the upload to the daemon, and prevents accidents:

node_modules
npm-debug.log
.next
.git
.gitignore
.env
.env.*
.dockerignore
Dockerfile
README.md
coverage
.vscode
.idea
*.local

.env and .env.* are the important lines. Secrets baked into an image layer are extractable by anyone who can pull it, forever, even if you delete the file in a later layer.

The Next.js standalone variant that actually shrinks things

For Next.js specifically, the biggest win is output: "standalone". Next traces exactly which files your server needs and emits a self-contained .next/standalone directory with a minimal node_modules — only the packages actually imported at runtime, not your entire dependency tree. You stop copying node_modules at all.

Turn it on in next.config.ts:

import type { NextConfig } from "next";
 
const nextConfig: NextConfig = {
  output: "standalone",
};
 
export default nextConfig;

Then the runner stage copies three small things instead of a fat node_modules:

FROM node:24-slim AS runner
WORKDIR /app
ENV NODE_ENV=production
ENV PORT=3000 HOSTNAME=0.0.0.0
 
RUN apt-get update && apt-get install -y --no-install-recommends tini \
    && rm -rf /var/lib/apt/lists/*
USER node
 
# standalone bundles its own minimal node_modules and server.js
COPY --chown=node:node --from=build /app/.next/standalone ./
COPY --chown=node:node --from=build /app/.next/static ./.next/static
COPY --chown=node:node --from=build /app/public ./public
 
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --start-period=20s --retries=3 \
    CMD node -e "fetch('http://127.0.0.1:'+process.env.PORT+'/api/health').then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))"
 
ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["node", "server.js"]

Note server.js instead of npm start — standalone generates its own server entrypoint, so you do not need npm at runtime at all. The static assets and public/ are copied separately because Next does not bundle them into standalone; serve them from there or, better, from a CDN.

The numbers

Here is the actual progression on that Next.js service:

Build	Final image	Cold rebuild	One-line rebuild
Single-stage `FROM node`	1.24GB	95s	90s
Multi-stage on `node:24-slim`	410MB	70s	4s
Standalone + slim	180MB	68s	4s
Standalone + distroless	165MB	68s	4s

The standalone output is where the runtime size collapses, because you stop shipping a full node_modules. The layer caching is where the developer-experience win lives — that 90s-to-4s on the inner loop is what your team actually feels twenty times a day.

Checklist

Before you call a Node Dockerfile production-ready, confirm:

Base image is slim or distroless, not the default node tag.
Multi-stage build so dev dependencies and the toolchain never reach the runtime.
Manifests copied before source, with npm ci in its own layer, plus a BuildKit npm cache mount.
npm prune --omit=dev (or standalone output) so the runtime node_modules is production-only.
USER node with COPY --chown=node:node — nothing runs as root.
tini or dumb-init as ENTRYPOINT, and a SIGTERM handler in your app for graceful drains.
A .dockerignore that excludes .git, node_modules, and every .env file.
A HEALTHCHECK hitting a real readiness endpoint.
output: "standalone" if it is Next.js.

None of this is advanced. It is the difference between an image you are proud of and a 1.2GB liability that wakes you up at 3am because a rollout SIGKILLed a node mid-request. Spend the twenty minutes.