All posts
Databases & Storage··9 min read

PostgreSQL Indexing Explained: Which Index Type to Use and Why

A practical guide to PostgreSQL index types — B-tree, GIN, GiST, BRIN, partial, composite, and covering indexes — and exactly when to reach for each one.

By

On this page

Indexing is the single biggest performance lever you have in a Postgres-backed app. It's also the one developers reach for last — usually after the query is slow in production, after the support tickets, after someone bumps the instance to a bigger box and the bill quietly doubles. I've watched teams spend a sprint on caching layers and read replicas to solve a problem that one well-placed index would have killed in an afternoon.

The catch is that "add an index" is not a single decision. Postgres ships five index access methods, and several modifiers on top of them. Reach for the wrong one and you get an index that's never used, bloats your writes, and lies to you in the planner. So let's go through what each type is actually for.

B-tree: the default, and usually correct

B-tree is what CREATE INDEX gives you when you don't ask for anything else. It handles equality and range queries — =, <, >, BETWEEN, IN, and ORDER BY — and that covers the overwhelming majority of real-world queries. Foreign keys, lookups by email, sorting by created_at: all B-tree.

CREATE INDEX idx_users_email ON users (email);
CREATE INDEX idx_orders_created_at ON orders (created_at);

If you're not sure what kind of index you need, you need a B-tree. Start here and only deviate when you have a concrete reason — a specific data type or query shape that B-tree can't serve.

Composite indexes and the column-order rule

A composite (multicolumn) index covers more than one column, and the order of those columns is the whole game. Postgres can use a composite index efficiently when your query filters on a leftmost prefix of its columns.

CREATE INDEX idx_orders_customer_status_date
  ON orders (customer_id, status, created_at);

This index serves WHERE customer_id = ?, and WHERE customer_id = ? AND status = ?, and all three together. It does not efficiently serve WHERE status = ? alone — status isn't a leftmost prefix, so the index is mostly dead weight for that query.

The rule of thumb: put equality columns first, then the column you range-scan or sort on last. A query like WHERE customer_id = 42 AND created_at > now() - interval '7 days' wants (customer_id, created_at) in exactly that order. Get this wrong and you've built an index the planner ignores.

Partial indexes: index only the rows that matter

A partial index covers a subset of rows defined by a WHERE clause. This is one of the most underused features in Postgres and one of the highest-leverage. If you constantly query a small slice of a big table, don't index the whole thing.

CREATE INDEX idx_orders_pending
  ON orders (created_at)
  WHERE status = 'pending';

Classic case: a jobs table with millions of completed rows and a few thousand pending ones. A partial index on the pending rows is tiny, stays hot in cache, and costs almost nothing to maintain because completed rows never touch it. Soft-deletes are the other big one — WHERE deleted_at IS NULL keeps your index focused on live data.

Covering indexes with INCLUDE

An index-only scan is when Postgres answers a query entirely from the index without visiting the table. You get there by making sure every column the query needs lives in the index. INCLUDE lets you bolt extra columns onto the index as payload — they're stored but not part of the searchable key.

CREATE INDEX idx_orders_customer_covering
  ON orders (customer_id)
  INCLUDE (total_amount, status);

Now SELECT total_amount, status FROM orders WHERE customer_id = ? never touches the heap. For hot read paths this is a real win, but don't reflexively INCLUDE everything — every added column makes the index bigger and slower to write.

Expression indexes

Index the result of an expression, not a raw column. The query has to use the exact same expression for the index to apply.

CREATE INDEX idx_users_lower_email ON users (lower(email));
-- matches: WHERE lower(email) = 'pavle@example.com'

This is the right fix for case-insensitive lookups, indexing a field extracted from JSON, or date-truncated grouping. Without it, the function call defeats a plain column index and you're back to a sequential scan.

GIN: jsonb, full-text, and arrays

GIN (Generalized Inverted Index) is built for columns that hold multiple values per row — jsonb, arrays, and tsvector full-text documents. It indexes each element so containment and membership queries stay fast.

CREATE INDEX idx_events_payload ON events USING gin (payload);
-- serves: WHERE payload @> '{"type": "signup"}'
 
CREATE INDEX idx_articles_search
  ON articles USING gin (to_tsvector('english', body));
-- serves: WHERE to_tsvector('english', body) @@ plainto_tsquery('postgres index')

If you're querying inside jsonb with the @> containment operator, or running full-text search, GIN is the answer. For jsonb where you only ever look up a couple of known keys, jsonb_path_ops makes a smaller, faster index at the cost of generality. GIN reads are excellent; just know writes are heavier than B-tree.

GiST is the extensible one — it powers PostGIS spatial queries, range-type overlap, nearest-neighbor search, and trigram similarity (pg_trgm) for fuzzy LIKE '%term%' matching. If you're doing "find rows near this point" or "find time ranges that overlap," GiST is your method. Most apps never need it, which is exactly why people forget it exists when they suddenly do need geospatial or fuzzy search.

BRIN: huge, append-only tables

BRIN (Block Range Index) stores the min/max value per block range instead of indexing every row. The index is microscopic — kilobytes where a B-tree would be gigabytes. The trade is that it only works when the column's physical order on disk correlates with its values.

CREATE INDEX idx_logs_created_brin
  ON logs USING brin (created_at);

This is purpose-built for time-series and event logs: rows arrive in timestamp order, the table is append-only, and you query by date range. On a billion-row logs table, BRIN gives you most of the benefit of a B-tree at a fraction of a percent of the size. On a table with random insertion order, it's useless.

Hash: rarely worth it

Hash indexes serve equality only — no ranges, no sorting. They became crash-safe and WAL-logged in Postgres 10, so they're no longer dangerous, but a B-tree handles equality nearly as well and does everything else. I reach for Hash maybe once a year. Default to B-tree.

Quick reference

Index typeUse it for
B-treeEquality, ranges, sorting — the default for ~90% of queries
HashEquality-only, when you've benchmarked a real win (rare)
GINjsonb containment, full-text search, array membership
GiSTGeometry/PostGIS, range overlap, fuzzy/trigram, nearest-neighbor
BRINMassive append-only tables with naturally ordered columns
PartialQuerying a small, well-defined slice of a large table
CompositeMulti-column filters — order by equality-then-range
Covering (INCLUDE)Hot read paths that want index-only scans
Expressionlower(), JSON extraction, computed lookups

Reading EXPLAIN ANALYZE

You don't guess whether an index helps — you measure it. EXPLAIN ANALYZE runs the query and reports the real plan with timing.

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;
 
-- Before the index:
-- Seq Scan on orders  (cost=0.00..18584.00 rows=120 width=84)
--   (actual time=0.412..142.380 rows=118 loops=1)
--   Filter: (customer_id = 42)
--   Rows Removed by Filter: 999882
 
-- After CREATE INDEX idx_orders_customer ON orders (customer_id):
-- Index Scan using idx_orders_customer on orders
--   (cost=0.42..8.61 rows=120 width=84)
--   (actual time=0.031..0.067 rows=118 loops=1)
--   Index Cond: (customer_id = 42)

Read it inside-out. A Seq Scan means Postgres read the entire table; the Rows Removed by Filter: 999882 line is the smell — it touched a million rows to return 118. An Index Scan with an Index Cond means it jumped straight to the matching rows. Watch actual time over cost (cost is the planner's estimate; actual time is reality), and check that estimated rows is in the same ballpark as actual rows — a big gap means stale statistics and a possible ANALYZE.

A subtle one: just because an index exists doesn't mean it's used. If the planner expects a query to return a large fraction of the table, a seq scan is genuinely faster, and it'll skip your index on purpose. That's correct behavior, not a bug.

The cost of over-indexing

Indexes aren't free, and this is where enthusiasm becomes a liability. Every index has to be updated on every INSERT, UPDATE, and DELETE that touches its columns. Five indexes on a hot table means five extra write operations per row change — write amplification that shows up as slower writes, more WAL, more vacuum pressure, and more disk. I've seen tables with a dozen overlapping indexes where half of them had never registered a single scan.

Find the dead weight with pg_stat_user_indexes — any index with idx_scan = 0 after a real workload is a candidate for deletion. Indexes are a read optimization paid for in write cost. Spend deliberately.

A checklist for deciding what to index

  • Start from the query, not the table. Profile your slow queries first; index the access patterns that actually hurt.
  • Index foreign keys. Postgres doesn't do this automatically, and unindexed FKs make joins and cascading deletes crawl.
  • Default to B-tree. Only deviate for a concrete data type or query shape it can't serve.
  • Order composite columns equality-first, range/sort-last, matching your real WHERE clauses.
  • Use partial indexes when you query a small, fixed slice of a big table.
  • Reach for GIN for jsonb/full-text/arrays, GiST for geo/range/fuzzy, BRIN for huge append-only tables.
  • Verify with EXPLAIN ANALYZE — confirm the index is actually used and the plan flipped to an index scan.
  • Audit pg_stat_user_indexes periodically and drop indexes with zero scans.
  • Re-evaluate after schema changes. Yesterday's perfect index is tomorrow's dead weight.

Indexing rewards the engineer who looks at the query plan instead of the engineer who adds indexes by vibe. Measure, index the access pattern, verify the plan changed, and delete what you don't use. Do that and you'll solve performance problems that other teams throw hardware at.