All posts
SaaS Development··8 min read

Building a SaaS That Scales: The Architecture Patterns Senior Engineers Rely On

Battle-tested SaaS architecture patterns from 17 years in production: multi-tenancy, modular monoliths, async work, caching, DB scaling, and safe rollouts.

By

On this page

Over 17+ years of building full-stack systems, I've learned the hardest part of SaaS isn't writing code. Code is the easy part. The hard part is designing a system that scales predictably — one that grows from ten customers to ten thousand without forcing you into a panic rewrite at 2 a.m. while support tickets pile up.

Most SaaS failures I've seen weren't market failures. They were architecture failures. The product found traction, traffic doubled, and the system folded because nobody designed for the second order of magnitude. Below are the patterns I actually rely on — the ones that earn their complexity.

Start With Multi-Tenancy, Because You Can't Bolt It On Later

Tenancy is the one decision you cannot defer. Retrofitting tenant isolation onto a system that assumed a single customer is one of the most expensive migrations in software. Pick a model deliberately on day one.

ModelIsolationCost / OpsBest for
Shared DB + row-level securityLogicalLowestMost B2B SaaS, MVP → scale
Schema-per-tenantStrong logicalMediumMid-market, per-tenant migrations
DB-per-tenantPhysicalHighestRegulated, enterprise, data residency

My default is shared database with Postgres row-level security (RLS). One schema, a tenant_id on every table, and the database itself enforces isolation — so a missing WHERE tenant_id = ... in application code can't leak another customer's data.

-- Every tenant-scoped table carries tenant_id
ALTER TABLE invoices ENABLE ROW LEVEL SECURITY;
ALTER TABLE invoices FORCE ROW LEVEL SECURITY;
 
-- The app sets the current tenant per request/transaction:
--   SET LOCAL app.current_tenant = '...';
CREATE POLICY tenant_isolation ON invoices
  USING (tenant_id = current_setting('app.current_tenant')::uuid)
  WITH CHECK (tenant_id = current_setting('app.current_tenant')::uuid);

FORCE ROW LEVEL SECURITY matters: without it, the table owner bypasses the policy and you lose the guarantee exactly where you think you're safe.

Schema-per-tenant buys you cleaner per-tenant backups and migrations at the cost of running migrations N times and hitting connection-pool ceilings. DB-per-tenant is the answer when a contract or a regulator demands physical separation or data residency — not before. Don't pay for isolation you don't need.

The Modular Monolith Beats Premature Microservices

This is the first of the two most expensive mistakes I see: premature microservices. Teams split into fifteen services before product-market fit, then spend their runway debugging distributed transactions and network partitions instead of shipping features.

Start with a modular monolith: one deployable, but internally organized into modules (billing, identity, notifications) with explicit boundaries and no reaching into another module's tables. You get the scaling story of services — clean seams to extract later — without the operational tax of distributed systems on day one. When a module genuinely needs independent scaling or a separate team, the boundary is already there to cut along.

Extract a service when you have a concrete reason: a different scaling profile, a different runtime, or an org boundary. "It feels cleaner" is not a reason.

Make Services Stateless So You Can Scale Horizontally

A scalable service holds no request state in memory. Sessions live in Redis or signed tokens, uploads go to object storage, background state lives in the database. Once any instance can serve any request, scaling is just running more instances behind a load balancer — and a crashed instance is a non-event, not an outage.

The corollary: never store anything in a local process that you'd cry about losing. Sticky sessions and in-memory caches that drift between instances are how "it works on one box" quietly becomes "it breaks on three."

Do Slow Work Asynchronously — Correctly

Anything slow or external — sending email, generating a PDF, calling a payment provider — belongs on a queue, not in the request path. Users get fast responses; spikes get absorbed instead of toppling you.

The two patterns that make async reliable are idempotency keys and the outbox pattern. Idempotency keys let a caller retry safely without double-charging. The outbox guarantees you never commit a database change but lose its side effect — you write the event in the same transaction as the data, then a relay publishes it.

// Idempotent charge + transactional outbox, in one DB transaction.
async function chargeCustomer(db: DB, key: string, tenantId: string, cents: number) {
  return db.transaction(async (tx) => {
    // Idempotency: first writer wins; retries return the prior result.
    const existing = await tx.findIdempotent(key);
    if (existing) return existing.result;
 
    const charge = await tx.insertCharge({ tenantId, cents, status: "pending" });
 
    // Outbox row committed atomically with the charge — never lost.
    await tx.insertOutbox({
      topic: "payment.requested",
      payload: { chargeId: charge.id, tenantId, cents },
    });
 
    const result = { chargeId: charge.id };
    await tx.saveIdempotent(key, result);
    return result;
  });
}
// A separate relay polls the outbox, publishes to the queue,
// and marks rows dispatched — at-least-once delivery, no lost events.

Consumers must be idempotent too, because at-least-once delivery means they will occasionally see a message twice. Design every handler to be safely replayable.

Caching: A Read-Through Layer With Tag-Based Invalidation

Caching is leverage, and it's also where the second expensive mistake lives: premature optimization. Don't cache until you've measured a real hot path. When you do, reach for read-through caching with tag-based invalidation so you can blow away related entries in one shot instead of guessing key names.

async function getWithCache<T>(
  key: string,
  tags: string[],
  ttl: number,
  load: () => Promise<T>,
): Promise<T> {
  const hit = await redis.get(key);
  if (hit) return JSON.parse(hit) as T;
 
  const value = await load();
  const tx = redis.multi();
  tx.set(key, JSON.stringify(value), "EX", ttl);
  for (const tag of tags) tx.sadd(`tag:${tag}`, key); // remember keys per tag
  await tx.exec();
  return value;
}
 
// Invalidate everything touching a tenant's products in one operation:
async function invalidateTag(tag: string) {
  const keys = await redis.smembers(`tag:${tag}`);
  if (keys.length) await redis.del(...keys, `tag:${tag}`);
}

Always set a TTL even with explicit invalidation — it's your safety net for the invalidation you'll inevitably forget to wire up.

Scale the Database Before It Becomes the Ceiling

The database is almost always the first thing to fall over, so plan for it.

  • Connection pooling. Postgres handles connections expensively. Put PgBouncer in front in transaction mode so hundreds of app instances share a small pool. Skipping this is how you hit too many connections at the worst possible moment.
  • Read replicas. Route reporting, dashboards, and other read-heavy traffic to replicas; keep writes on the primary. Be deliberate about replication lag for read-your-own-writes flows.
  • Partitioning. When a table crosses tens of millions of rows, partition it — by time for events, or by tenant for the largest accounts. Indexes stay smaller, and you can drop old partitions instead of running murderous DELETEs.

Protect Tenants From Each Other: Rate Limiting and Noisy Neighbors

In multi-tenancy, one tenant's runaway batch job can starve everyone else — the noisy-neighbor problem. Rate-limit per tenant, not just globally, so one customer can't consume the whole pool. For heavy async work, give large tenants their own queues or concurrency budgets so a flood from one doesn't delay everyone. Fairness isn't a nicety here; it's the difference between one unhappy customer and a platform-wide incident.

Observability: You Can't Scale What You Can't See

When something breaks at scale, logs alone won't save you. I instrument three things from the start:

  • Structured logs (JSON) with tenant_id, request_id, and trace_id on every line, so you can slice an incident by customer.
  • Distributed traces so you can see exactly where a request spent its 4 seconds.
  • The metrics that actually matter: request latency at p95/p99 (not the lying average), error rate, queue depth and age, and DB pool saturation. These four predict almost every outage before it happens.

Ship Safely: Feature Flags and Gradual Rollouts

Big-bang deploys are a bet you don't need to make. Put risky changes behind feature flags and roll them out gradually — internal users, then 1%, then 10%, then everyone — watching the metrics above at each step. When something's wrong, you flip a flag instead of scrambling a rollback. This single practice has prevented more outages for me than any amount of pre-deploy testing.

Start Here: A Maturity Ladder

You don't build all of this on day one. Match the architecture to the stage:

MVP

Modular monolith, single Postgres with RLS multi-tenancy, a basic job queue, structured logging. Ship and learn. Resist every urge to over-build.

Product-Market Fit

Add PgBouncer and read replicas, read-through caching on proven hot paths, per-tenant rate limiting, the outbox pattern for critical side effects, traces, and feature flags. This is where reliability becomes a feature.

Scale

Extract services along the seams the monolith already gave you, partition your biggest tables, dedicate queues for large tenants, and revisit your tenancy model only if enterprise or regulatory needs demand physical isolation.

The throughline across all 17 years: build the simplest thing that has a clear, low-cost path to the next order of magnitude. Avoid premature microservices and premature optimization, design clean boundaries early, and let real usage — not architecture-astronaut instinct — tell you when to add complexity. That's how you build a SaaS that scales without rewrites.