All posts
SaaS Development··10 min read

Designing a Permissions System: RBAC, ABAC, and What Actually Ships

Authorization is where SaaS codebases rot. A practical guide to RBAC vs ABAC, multi-tenant permissions, and enforcing them without scattering checks.

By

On this page

Every authorization bug I've shipped to production came from the same root cause: a permission check that lived in exactly one place when it should have lived in zero. Someone added a new "duplicate project" endpoint, copied the guard from "create project," and forgot that duplicating pulls in members from the source project across a tenant boundary. The check passed. The data leaked. Nobody noticed for three weeks.

Authentication answers "who are you." It's mostly a solved problem — buy it, don't build it. Authorization answers "what are you allowed to do," and it's the part nobody can buy for you because it's woven into your domain. It's also where SaaS codebases rot first, because the naive version works fine for the first six months and then becomes load-bearing spaghetti you're terrified to touch.

This is how I build authorization so it stays maintainable past the seed round.

The three models, and when each earns its place

There are exactly three authorization models worth knowing. Most teams need one. A few need two. Almost nobody needs all three on day one.

RBAC (role-based) maps users to roles, and roles to permissions. "Admins can delete projects." It's the default for a reason: it's trivial to reason about, trivial to put in a UI, and covers 90% of B2B SaaS. The failure mode is role explosion — when "admin who can also see billing but not invite users" becomes its own role, and then you have forty roles and a spreadsheet explaining them.

ABAC (attribute-based) evaluates conditions against attributes of the user, resource, and environment. "Editors can update a document if they're in the same region and it's not locked." This is where you reach when permissions depend on data, not just identity. The cost is that authorization decisions now depend on fetching resource state, which complicates caching and makes "why was I denied" harder to answer.

ReBAC (relationship-based, the Google Zanzibar model, popularized by OpenFGA) expresses permissions as graph relationships. "You can view this folder if you're an editor of any parent folder." This is the right tool when your authorization is a graph — nested resources, sharing, org hierarchies — like Google Docs or Notion. It's also operationally heavy: you're running a separate authorization service with its own consistency model.

ModelDecision based onBest fitMain cost
RBACUser's rolesStandard B2B SaaS, admin panelsRole explosion
ABACAttributes + conditionsData-dependent rules, complianceCaching, debuggability
ReBACRelationship graphNested sharing, hierarchiesOperational complexity

My rule: start with RBAC. Add ABAC conditions to specific permissions only when a real requirement forces it. Reach for ReBAC only when your product is fundamentally a sharing graph. Adopting OpenFGA on day one because you read the Zanzibar paper is how you spend your runway operating a distributed system to guard twelve endpoints.

Model it in Postgres

RBAC is four tables. Permissions are strings shaped as resource:action — flat, greppable, and easy to render in a UI. Critically, every grant is scoped to a tenant.

create table tenants (
  id   uuid primary key default gen_random_uuid(),
  slug text unique not null
);
 
create table roles (
  id        uuid primary key default gen_random_uuid(),
  tenant_id uuid not null references tenants(id) on delete cascade,
  name      text not null,
  unique (tenant_id, name)
);
 
create table permissions (
  id   uuid primary key default gen_random_uuid(),
  key  text unique not null   -- e.g. 'project:delete', 'billing:read'
);
 
create table role_permissions (
  role_id       uuid not null references roles(id) on delete cascade,
  permission_id uuid not null references permissions(id) on delete cascade,
  primary key (role_id, permission_id)
);
 
create table user_roles (
  user_id   uuid not null,
  role_id   uuid not null references roles(id) on delete cascade,
  tenant_id uuid not null references tenants(id) on delete cascade,
  primary key (user_id, role_id, tenant_id)
);

Two design decisions worth defending. First, permissions is global but roles are per-tenant. Permission keys are part of your code — defined once, shared by everyone. Roles are tenant-owned because tenant A's "Manager" is not tenant B's "Manager." Second, user_roles carries tenant_id redundantly even though it's derivable through the role. That redundancy is the foundation for row-level security later, and it makes the "all of this user's grants in this tenant" query a single index scan.

The resolved-permissions query is one join:

select distinct p.key
from user_roles ur
join role_permissions rp on rp.role_id = ur.role_id
join permissions p       on p.id = rp.permission_id
where ur.user_id = $1
  and ur.tenant_id = $2;

One can() to rule them all

Here's the part that actually keeps the system maintainable: every authorization decision goes through a single function. Not a base controller. Not a decorator you sometimes remember. One function, can(subject, action, resource), that is the only thing in your codebase allowed to say yes.

The reason is auditability. When there's exactly one chokepoint, you can log every decision, you can test the policy in isolation, and "where do we check permissions" has a one-word answer. Scattered if (user.role === 'admin') checks are unauditable by construction — you can never prove you found all of them.

type Subject = {
  userId: string;
  tenantId: string;
  permissions: Set<string>; // resolved once per request
};
 
type Resource = {
  type: string;
  tenantId: string;
  ownerId?: string;
  locked?: boolean;
};
 
// Optional ABAC conditions, keyed by permission. RBAC is the default;
// a condition only runs if one is registered for that permission.
type Condition = (s: Subject, r: Resource) => boolean;
 
const conditions: Record<string, Condition> = {
  // ABAC layered onto RBAC: you may update a doc only if it isn't locked,
  // unless you own it.
  "document:update": (s, r) => !r.locked || r.ownerId === s.userId,
};
 
export function can(s: Subject, action: string, r: Resource): boolean {
  // 1. Tenant isolation is non-negotiable and comes first.
  if (s.tenantId !== r.tenantId) return false;
 
  // 2. RBAC: does any role grant this permission?
  if (!s.permissions.has(action)) return false;
 
  // 3. ABAC: if a condition exists for this action, it must also pass.
  const cond = conditions[action];
  return cond ? cond(s, r) : true;
}

Notice the shape. Tenant check first, always — it's the one rule that can never be overridden by a role. Then RBAC as the base layer. Then ABAC conditions added on top of specific permissions, not as a parallel system. You get RBAC's simplicity everywhere and pay for ABAC's complexity only on the handful of permissions that genuinely need it. New permissions are pure RBAC by default. That's the whole trick to not over-engineering: ABAC is opt-in, per permission.

Resolve the subject once per request, behind a cache, so can() itself stays a synchronous, pure function you can call freely without worrying about N+1 lookups.

Cache the lookup, not the decision

Permission resolution (the SQL join above) is the expensive part and it changes rarely — only when an admin edits a role. The decision depends on the specific resource and must always be fresh. So cache the former, never the latter.

import { redis } from "./redis";
 
const TTL = 300; // 5 minutes — short enough that revocations land fast
 
export async function resolvePermissions(
  userId: string,
  tenantId: string,
): Promise<Set<string>> {
  const key = `perms:${tenantId}:${userId}`;
  const cached = await redis.get(key);
  if (cached) return new Set(JSON.parse(cached));
 
  const rows = await db.query<{ key: string }>(
    `select distinct p.key
       from user_roles ur
       join role_permissions rp on rp.role_id = ur.role_id
       join permissions p       on p.id = rp.permission_id
      where ur.user_id = $1 and ur.tenant_id = $2`,
    [userId, tenantId],
  );
 
  const keys = rows.map((r) => r.key);
  await redis.set(key, JSON.stringify(keys), "EX", TTL);
  return new Set(keys);
}
 
// On any role/permission change, bust the relevant keys immediately.
export async function invalidateTenant(tenantId: string) {
  const keys = await redis.keys(`perms:${tenantId}:*`);
  if (keys.length) await redis.del(...keys);
}

The 5-minute TTL plus explicit invalidation on write gives you fast reads and bounded staleness. The thing people get wrong is caching the decision (can user X delete project Y) — now a single role edit invalidates a combinatorial number of keys and you'll never get it right. Cache the role-derived permission set. Keep can() pure.

Enforce at the edge — and again at the database

The can() layer is your primary enforcement. A route guard makes it impossible to forget:

import type { Request, Response, NextFunction } from "express";
 
export function requirePermission(action: string) {
  return async (req: Request, res: Response, next: NextFunction) => {
    const subject = req.subject; // set by auth middleware
    const resource = await loadResource(req); // type, tenantId, owner, ...
 
    if (!can(subject, action, resource)) {
      // Log every denial — this is your audit trail and your alert source.
      logger.warn("authz.denied", {
        userId: subject.userId,
        tenantId: subject.tenantId,
        action,
        resourceType: resource.type,
      });
      return res.status(403).json({ error: "forbidden" });
    }
    next();
  };
}
 
// Usage — the permission lives next to the route, not buried in the handler:
router.delete(
  "/projects/:id",
  requirePermission("project:delete"),
  deleteProjectHandler,
);

But application checks fail open when someone writes a raw query, a background job, or a new service that forgets the guard. So I add Postgres row-level security as defense in depth. Even if the app layer is bypassed, the database refuses to return another tenant's rows.

alter table projects enable row level security;
 
create policy tenant_isolation on projects
  using (tenant_id = current_setting('app.tenant_id')::uuid);

Then set the tenant context at the start of each request's transaction:

await db.query("set local app.tenant_id = $1", [subject.tenantId]);

RLS is not a replacement for can() — it does coarse tenant isolation, not fine-grained "can this user delete." It's the seatbelt that catches the bug can() should have caught. Together they're belt and suspenders, and the cost is two lines of SQL per table.

A maturity ladder

You don't build all of this at once. You climb it as the product demands.

mvp:
  - Hardcoded roles enum (owner, admin, member)
  - Single can() function, RBAC only
  - Permission strings checked in route guards
  - Tenant id on every query
 
growth:
  - Roles and permissions in Postgres, editable per tenant
  - Redis cache for resolved permissions with TTL + invalidation
  - Postgres RLS for tenant isolation as defense in depth
  - Structured audit log of every authz decision
 
scale:
  - ABAC conditions on the specific permissions that need them
  - Custom roles UI for tenant admins
  - Decision audit log shipped to a queryable store
  - Consider OpenFGA only if you have a true sharing graph

The mistake is jumping rungs. Teams adopt policy engines and graph-based authz services while they still have three roles and one tenant per customer, then spend a quarter operating infrastructure to solve a problem they don't have. The opposite mistake — scattering if checks until authorization becomes unprovable — is worse but slower-acting, so it's the one that ships.

Here's the checklist I run against any permissions system before I trust it:

  • One chokepoint. Can you grep for every place authorization is decided? If not, you have scattered checks. Fix that first.
  • Tenant-first. Is the tenant boundary checked before any role logic, in code and in the database? Cross-tenant leaks are the bugs that make the news.
  • RBAC by default, ABAC by exception. Are conditions opt-in per permission, or did you build a policy engine before you needed one?
  • Cache the set, not the decision. Is resolution cached with explicit invalidation on write, and is can() a pure function?
  • Every denial is logged. When a customer asks "why was I blocked," can you answer in one query?

Get those five right and authorization stops being the scariest file in the repo. Everything else is iteration.