Secrets Management: Stop Shipping API Keys in .env

I have lost count of how many times I have run git log -p on a client repo and watched a live Stripe key, an AWS access key, and a database URL with the password inline scroll past in plaintext. Usually they are still valid. Usually nobody knows they are there. The pattern is always the same: someone committed .env "just once" during setup, deleted it in a later commit, and assumed that was the end of it. Git remembers forever. That key is in the history, in every clone, in every fork, and increasingly in someone's training scraper.

A .env file is a fine local-dev convenience and a terrible production strategy. Let me walk through why, and the actual maturity ladder you climb to get off it.

Why .env is a dev-only stopgap

The .env file solved a real problem: keeping config out of source code so you could follow the Twelve-Factor App advice to store config in the environment. That part is good. The problem is everything around the file.

Accidental commits. One missing .gitignore line and the whole thing is in history. The OWASP Top 10 has carried some form of "Security Misconfiguration" and "Cryptographic Failures" category for years, and leaked credentials are the most boring, most common version of both.
Sprawl. The same secret lives on your laptop, your coworker's laptop, the CI runner, a Slack DM from when you onboarded someone, and a 1Password note nobody updated. There is no single source of truth, so there is no way to answer "who has this key."
No rotation. Static files do not rotate. When an employee leaves or a key leaks, you are doing archaeology across machines to find every copy.
No audit trail. You cannot answer "what read this secret, and when." There is nothing to read. A file does not log access.

None of these are theoretical. They are the post-incident findings on basically every credential-leak retro I have sat in.

The maturity spectrum

Think of this as four rungs. You do not skip rungs for fun, but you should know which one you are standing on and why.

Stage	Mechanism	Rotation	Audit	Good for
0	`.env` file	Manual	None	Local dev only
1	Platform env vars (Vercel, Fly, etc.)	Manual	Partial	Small teams, single platform
2	Secret manager (AWS Secrets Manager, Vault, Doppler)	Automated	Full	Real production, multiple services
3	Workload identity / OIDC	No long-lived secret exists	Full	Cloud-to-cloud, CI

Stage 0: .env, done correctly

If you are going to use .env locally — and you should — commit a template, never the real thing. Two files, one ignored.

# .gitignore
.env
.env.local
.env.*.local
!.env.example

# .env.example — committed, fake values, documents required keys
DATABASE_URL="postgresql://user:password@localhost:5432/app"
STRIPE_SECRET_KEY="sk_test_replace_me"
JWT_SIGNING_KEY="generate_with_openssl_rand_-hex_32"
SENTRY_DSN=""

The .env.example is the contract. A new developer copies it to .env, fills in real values from your secret manager, and runs. New required key? You add it to the example in the same PR that needs it, so review catches a missing variable instead of a 3 a.m. crash loop.

Stage 1: platform environment variables

Once you deploy, the secret should live where the workload runs, not in a file you upload. On Vercel that is project environment variables, scoped per environment:

# Set a production-only secret, never written to disk in the repo
vercel env add STRIPE_SECRET_KEY production
 
# Pull non-production values into a gitignored local file for dev
vercel env pull .env.local

This gets you off shared files and gives you per-environment scoping (preview keys cannot touch production). It is a real step up. The ceiling: rotation is still manual, the audit trail is whatever your platform happens to log, and if you run on three platforms you now have three sources of truth.

Stage 2: a real secret manager

This is where most production systems should live. A secret manager is a service whose entire job is to store secrets, control who reads them, rotate them, and log every access. You fetch the secret at startup over an authenticated API instead of baking it into the environment.

Here is the pattern with AWS Secrets Manager and the AWS SDK for JavaScript v3, fetching at boot and caching in memory so you are not hitting the API on every request:

import {
  SecretsManagerClient,
  GetSecretValueCommand,
} from "@aws-sdk/client-secrets-manager";
 
const client = new SecretsManagerClient({ region: "eu-central-1" });
let cache: Record<string, string> | null = null;
 
export async function loadSecrets(): Promise<Record<string, string>> {
  if (cache) return cache;
 
  const res = await client.send(
    new GetSecretValueCommand({ SecretId: "prod/app/config" }),
  );
 
  if (!res.SecretString) {
    throw new Error("SecretString missing — is the secret binary?");
  }
 
  cache = JSON.parse(res.SecretString) as Record<string, string>;
  return cache;
}

Notice what is missing: there is no AWS access key in this code. The client picks up credentials from the runtime's role (more on that below). The only thing this process knows how to do is ask for prod/app/config, and IAM decides whether it is allowed.

If you want secrets that live in git but stay encrypted — useful for GitOps and Kubernetes — SOPS (Mozilla's sops, now a CNCF project) encrypts values in place with a KMS key or age key. The file is committed, the values are ciphertext, and only a workload holding the decryption key can read them:

# secrets.enc.yaml — safe to commit; values are KMS-encrypted
stripe_secret_key: ENC[AES256_GCM,data:9KpL...==,type:str]
database_url: ENC[AES256_GCM,data:7Hq2...==,type:str]
sops:
  kms:
    - arn: arn:aws:kms:eu-central-1:111122223333:key/abcd-1234

The honest tradeoff: managers add a network dependency and a small cold-start cost (single-digit milliseconds for a cached client, ~50–200ms for the first uncached fetch). For 99% of services that is invisible. SOPS keeps the git-native workflow but you own key distribution. Doppler and HashiCorp Vault sit in the same tier — Vault if you want dynamic, short-lived database credentials generated on demand; Doppler if you want a managed sync layer with less operational overhead.

Stage 3: stop having a secret at all

The best long-lived secret is the one that does not exist. This is the part most teams have not adopted yet, and it is the single biggest leverage move available.

Workload identity / OIDC replaces a static credential with a short-lived token your platform mints and your cloud verifies. GitHub Actions can present an OIDC token (an RFC 7519 JWT) that AWS, GCP, or Azure trusts. The cloud hands back credentials valid for an hour. Nothing long-lived is ever stored.

Compare the two CI approaches honestly:

	Static access key	OIDC / workload identity
Stored in CI	Yes — `AWS_SECRET_ACCESS_KEY`	No — token minted per run
Lifetime	Until you rotate (often: never)	~1 hour
Leak blast radius	Full account access, indefinitely	One run, expires fast
Rotation	Manual, easy to forget	Automatic, nothing to rotate
Setup cost	Two secrets, five minutes	One IAM role + trust policy

Here is the GitHub Actions job. There is no access key anywhere — permissions: id-token: write is what lets the runner request the OIDC token:

name: deploy
on:
  push:
    branches: [main]
 
permissions:
  id-token: write   # required to mint the OIDC token
  contents: read
 
jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
 
      - name: Configure AWS credentials via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::111122223333:role/gha-deploy
          aws-region: eu-central-1
          # No aws-access-key-id, no aws-secret-access-key.
 
      - run: aws s3 sync ./dist s3://my-bucket --delete

On the AWS side, the IAM role's trust policy pins it to your exact repo and branch, so a fork or a feature branch cannot assume it. Scope the sub claim — repo:my-org/my-repo:ref:refs/heads/main — never a wildcard. The same idea applies in your running infra: an EC2 instance role, an ECS task role, or a Kubernetes service account via IRSA means your app code holds zero credentials and the SDK example above just works.

The non-negotiables, regardless of stage

These apply at every rung, and skipping them is how "we use a secret manager" still turns into an incident.

Least privilege and scoping. A service gets read access to its own secrets and nothing else. The deploy role cannot read the database password unless deploy actually needs it. Separate prod and non-prod paths (prod/app/* vs staging/app/*) so a staging compromise stays in staging.
Never log secrets. This is where leaks hide in plain sight. Redact in your logger, and never dump process.env into an error report. A surprising number of "secret manager" setups leak the secret straight into Datadog because someone logged the config object on boot.

const REDACT = /(secret|token|password|key|dsn)/i;
 
function safe(obj: Record<string, unknown>) {
  return Object.fromEntries(
    Object.entries(obj).map(([k, v]) => [k, REDACT.test(k) ? "***" : v]),
  );
}
 
logger.info("config loaded", safe(config)); // values masked by key name

Rotation on a schedule and on every offboard. If rotation is hard, it never happens. Managers automate it; OIDC sidesteps it entirely. When someone leaves, rotation should be a button, not an investigation.
Build-time vs runtime injection. Know the difference. Anything inlined at build time — a NEXT_PUBLIC_* var, a VITE_* var — ships to the browser in plaintext. It is not a secret; it is public config. Real secrets must be injected at runtime, server-side only. I have seen a "private" API key bundled into client JS because someone prefixed it NEXT_PUBLIC_ to silence a build warning. Assume anything in the client bundle is published.

Leak detection: assume you will slip

You will eventually commit something you should not. Catch it fast.

gitleaks as a pre-commit hook and a CI gate. It scans diffs (and full history with --log-opts) against entropy and provider patterns:

# Pre-commit: block the commit if a secret is staged
gitleaks protect --staged --redact --verbose
 
# CI / audit: scan the entire history
gitleaks detect --source . --redact

Provider push protection. Turn on GitHub Secret Scanning push protection at the org level. It blocks a push containing a recognized credential pattern before it ever reaches the remote — the cheapest possible save.
When it does leak, rotate first, scrub second. The instinct is to rewrite history with git filter-repo. Do that, but only after you have revoked the key. A scrubbed-but-still-valid key in someone's existing clone is still a live key. Revocation is the fix; history rewriting is cleanup.

This ties directly into supply-chain and CI hardening. Your CI runner is the juiciest target you own — it has, by design, the credentials to deploy to production. Short-lived OIDC tokens, pinned action SHAs, scoped roles, and minimal permissions: blocks are the same discipline as secrets management, applied one layer out.

The decision framework

Run down this list whenever you stand up a new service:

Local dev only? .env plus a committed .env.example, with .gitignore correct on day one.
Deployed on one platform? Move secrets into platform env vars, scoped per environment.
Production, multiple services, or any compliance need? Adopt a secret manager (AWS Secrets Manager, Vault, Doppler) or SOPS if you want git-native encrypted secrets. Fetch at startup, cache in memory.
Cloud-to-cloud or CI auth? Use OIDC / workload identity so no long-lived credential exists at all. This should be the default for new CI, not an upgrade you get to "later."
Always, at every stage: least privilege, no secrets in logs, rotation on a schedule and on offboarding, gitleaks in CI, and push protection on.

The goal is not perfect security. It is making the lazy path the safe path — a .gitignore that is right by default, a CI pipeline with no key to steal, and a manager that rotates so you never have to remember to. Get those in place and "we leaked a key" stops being a question of when.