Audit Logs for SaaS: What to Capture and How to Store It

The first time a customer asks "who deleted our production API key on Tuesday?", you find out whether you actually have audit logs or just a console.log graveyard. I've been on both sides of that question. The team with a real audit trail answered in four minutes with a tenant-scoped query. The team without one spent two days grepping CloudWatch and still couldn't prove the deletion wasn't an insider.

Audit logs are not application logs. Application logs are for you — debugging, latency, stack traces, ephemeral. Audit logs are for someone else — your security team during an incident, your auditor during SOC 2 Type II fieldwork, your enterprise customer during a renewal where their CISO wants to see access history. They have different consumers, different retention, different integrity guarantees, and they should live in a different place. Conflating them is the most common mistake I see, and it quietly fails you exactly when the stakes are highest.

What an audit event actually needs

An audit event answers one sentence: who did what to which thing, when, from where, in what context, and what changed. If a field doesn't help answer that sentence, it doesn't belong in the audit record — push it to application logs.

That maps to a fixed set of columns:

actor — the user or service principal that performed the action. Store the stable ID, not just the email, because emails change.
action — a controlled vocabulary like user.invite, apikey.revoke, billing.plan.change. Free-text actions are useless for filtering and alerting.
target — the object acted on (type + id), e.g. api_key:ak_9f2c.
tenant — the organization/workspace. This is non-negotiable in multi-tenant SaaS; it scopes every query and every customer-facing view.
timestamp — UTC, with timezone, server-assigned. Never trust a client clock.
source IP and user agent — for geo-anomaly detection and "was this us or them?"
request_id — to correlate back to your application logs without polluting the audit record with debug noise.
before / after state — the diff. For a permission change, "before: member, after: admin" is the whole point.

A reasonable Postgres schema, partitioned by month so retention and pruning stay cheap:

CREATE TABLE audit_log (
    id            BIGINT GENERATED ALWAYS AS IDENTITY,
    tenant_id     UUID         NOT NULL,
    actor_id      UUID,                    -- nullable: system/cron actions
    actor_type    TEXT         NOT NULL,   -- 'user' | 'service' | 'system'
    action        TEXT         NOT NULL,   -- controlled vocabulary
    target_type   TEXT,
    target_id     TEXT,
    source_ip     INET,
    user_agent    TEXT,
    request_id    TEXT,
    before        JSONB,
    after         JSONB,
    metadata      JSONB        NOT NULL DEFAULT '{}',
    prev_hash     BYTEA,                   -- hash chain, see below
    row_hash      BYTEA        NOT NULL,
    created_at    TIMESTAMPTZ  NOT NULL DEFAULT now(),
    PRIMARY KEY (id, created_at)
) PARTITION BY RANGE (created_at);
 
CREATE TABLE audit_log_2026_03 PARTITION OF audit_log
    FOR VALUES FROM ('2026-03-01') TO ('2026-04-01');
 
CREATE INDEX ON audit_log_2026_03 (tenant_id, created_at DESC);
CREATE INDEX ON audit_log_2026_03 (tenant_id, action, created_at DESC);

Partitioning by created_at range means dropping a month of expired data is a DROP TABLE on one partition — a metadata operation, not a DELETE that bloats your table and fights autovacuum for an hour.

Append-only and immutable, enforced by the database

"Append-only" you promise in a design doc. Immutability you enforce in the engine, because the application is exactly the thing an attacker compromises. Postgres lets you revoke UPDATE and DELETE from the writing role entirely:

-- The app connects as this role. It can read and append, nothing else.
REVOKE UPDATE, DELETE, TRUNCATE ON audit_log FROM app_writer;
GRANT  INSERT, SELECT             ON audit_log TO app_writer;
 
-- Belt and suspenders: block mutation even if a grant slips through.
CREATE OR REPLACE FUNCTION audit_log_immutable()
RETURNS TRIGGER AS $$
BEGIN
    RAISE EXCEPTION 'audit_log is append-only';
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER no_mutate_audit
    BEFORE UPDATE OR DELETE ON audit_log
    FOR EACH ROW EXECUTE FUNCTION audit_log_immutable();

Retention deletion then happens only at the partition level, by a privileged migration role on a schedule — never row-by-row from application code. This split (append from the app, prune from ops) is what an auditor actually wants to see: the service that creates records physically cannot alter them.

Tamper-evidence with a hash chain

Immutability stops casual edits. It does not prove to a third party that nobody — including a DBA with superuser — touched the data. For that you want tamper-evidence: a hash chain, the same idea behind Git commits and Certificate Transparency logs (RFC 6962). Each row hashes its own content plus the previous row's hash. Change any historical row and every subsequent hash breaks, so verification is O(n) and forgery requires rewriting the entire tail.

import { createHash } from "node:crypto";
 
type AuditInput = {
  tenantId: string;
  actorId: string | null;
  actorType: "user" | "service" | "system";
  action: string;
  targetType?: string;
  targetId?: string;
  sourceIp?: string;
  userAgent?: string;
  requestId?: string;
  before?: unknown;
  after?: unknown;
  metadata?: Record<string, unknown>;
};
 
function rowHash(input: AuditInput, createdAt: string, prevHash: Buffer): Buffer {
  // Canonical, order-stable serialization. Any drift here breaks verification,
  // so freeze this function and version it if you ever change it.
  const payload = JSON.stringify({
    tenantId: input.tenantId,
    actorId: input.actorId,
    actorType: input.actorType,
    action: input.action,
    targetType: input.targetType ?? null,
    targetId: input.targetId ?? null,
    sourceIp: input.sourceIp ?? null,
    before: input.before ?? null,
    after: input.after ?? null,
    metadata: input.metadata ?? {},
    createdAt,
  });
  return createHash("sha256")
    .update(prevHash)
    .update(payload)
    .digest();
}

The insert helper chains each row off the previous one for that tenant. Per-tenant chains keep verification scoped and let you hand one customer their own verifiable slice without leaking another tenant's hashes:

import { sql } from "./db"; // your pg pool wrapper
 
export async function recordAudit(input: AuditInput): Promise<void> {
  await sql.tx(async (tx) => {
    // Serialize the chain head per tenant to avoid hash races.
    const [{ prev_hash }] = await tx`
      SELECT prev_hash FROM audit_log
      WHERE tenant_id = ${input.tenantId}
      ORDER BY created_at DESC, id DESC
      LIMIT 1
      FOR UPDATE
    ` as Array<{ prev_hash: Buffer | null }>;
 
    const prev = prev_hash ?? Buffer.alloc(32); // genesis = 32 zero bytes
    const createdAt = new Date().toISOString();
    const hash = rowHash(input, createdAt, prev);
 
    await tx`
      INSERT INTO audit_log (
        tenant_id, actor_id, actor_type, action, target_type, target_id,
        source_ip, user_agent, request_id, before, after, metadata,
        prev_hash, row_hash, created_at
      ) VALUES (
        ${input.tenantId}, ${input.actorId}, ${input.actorType}, ${input.action},
        ${input.targetType ?? null}, ${input.targetId ?? null},
        ${input.sourceIp ?? null}, ${input.userAgent ?? null}, ${input.requestId ?? null},
        ${input.before ?? null}, ${input.after ?? null}, ${input.metadata ?? {}},
        ${prev}, ${hash}, ${createdAt}
      )
    `;
  });
}

Verification later is a fold: re-derive each hash from the previous one and confirm it matches row_hash. For real third-party guarantees, periodically publish the latest hash per tenant somewhere you don't control — sign it and email it to the customer, or anchor it in a transparency log. The point isn't cryptographic perfection; it's being able to say "here is the chain head we committed to on March 1, prove we altered anything since."

Don't pay the audit cost on the request hot path

The synchronous insert above is correct but it adds a transaction, a FOR UPDATE lock, and a hash to every mutating request. Under load, that per-tenant lock serializes writes for busy tenants. There are two ways out, and the right one depends on your durability needs.

Approach	Latency added to request	Durability if app crashes	Ordering guarantee	Complexity
Synchronous insert	High (lock + hash)	Strong — committed with the action	Total per tenant	Low
Async queue/worker	~0	Weak — buffered events can be lost	Best-effort	Medium
Transactional outbox	Low (one insert, no hash)	Strong — same DB transaction	Total per tenant	Medium

For anything compliance-relevant, I default to the outbox pattern. You write a plain audit_outbox row inside the same database transaction as the business change, so the audit record and the action commit or roll back together — no "we did the thing but forgot to log it" gap. A separate worker drains the outbox, computes the hash chain, and writes the final audit_log row. The expensive serialized part moves off the request path while durability stays atomic with the action.

// Inside the same tx that revokes the API key:
await tx`
  INSERT INTO audit_outbox (tenant_id, payload)
  VALUES (${tenantId}, ${JSON.stringify(auditInput)})
`;
// Action + audit intent commit together. Worker handles chaining + delivery.

A pure async queue (push to SQS/Redis, return immediately) is fine for security telemetry you can tolerate losing a few of. It is not fine for the record that proves an admin changed someone's role. Know which one you're building.

Postgres table vs. dedicated log store

You don't have to choose forever, but you do have to choose first.

Partitioned Postgres table — start here. Transactional consistency with your app data, you already operate it, joins to user/tenant tables are trivial, and the outbox pattern is natural. It comfortably handles millions of events per tenant per month. The ceiling is when audit write volume rivals your application write volume.
Dedicated store (ClickHouse, an immutable object-storage log, or a managed audit product) — reach for this when you're past tens of millions of events a day, need sub-second analytics across all tenants, or want WORM (write-once-read-many) storage that satisfies stricter regimes. The cost is a second system to operate and an eventual-consistency seam.

The mistake is jumping to ClickHouse on day one for a product doing 50k audit events a day. Postgres will serve you for years. Migrate when the numbers force you, not before.

Expose a scoped view — that's the trust feature

Internal audit logs satisfy SOC 2. Customer-visible audit logs win renewals. Enterprise buyers increasingly expect a self-serve "Activity log" in your dashboard, and exposing one signals you have nothing to hide. The hard rule: the customer view is strictly tenant-scoped and shows only fields safe to surface — never raw source_ip of other users, never internal metadata, never another tenant's anything.

This is where audit logs meet your permissions model. The customer-facing query must run under the same authorization that governs the rest of their data — ideally Postgres row-level security so a missing WHERE tenant_id = can never leak across tenants:

ALTER TABLE audit_log ENABLE ROW LEVEL SECURITY;
 
CREATE POLICY tenant_isolation ON audit_log
    FOR SELECT
    USING (tenant_id = current_setting('app.current_tenant')::uuid);

Gate which actions a given role can even see (a billing-only admin shouldn't read SSO config changes), and you've connected the audit trail to your RBAC model end to end.

A checklist before you ship it

Separate audit from application logs — different table, different retention, different consumers. Non-negotiable.
Capture the seven core fields plus before/after state. Use a controlled action vocabulary, not free text.
Enforce immutability in the database — revoke UPDATE/DELETE from the app role; prune only via partition drops.
Hash-chain per tenant for tamper-evidence, and periodically publish the chain head off-box.
Use the outbox pattern so the audit record commits atomically with the action but the hashing stays off the hot path.
Start in partitioned Postgres; migrate to a dedicated store only when volume forces it.
Ship a tenant-scoped customer view behind RLS and your RBAC model — it's a sales asset, not just compliance.
Write a retention policy down (e.g. 12 months hot, archive to object storage for 7 years) and check it against GDPR data-minimization and your customer contracts.

Audit logs feel like overhead right up until the moment they're the only thing standing between you and an unprovable accusation. Build the boring version now; you'll never wish you'd built it later.