Postgres Full-Text Search vs a Dedicated Search Engine

A team I worked with last year added Elasticsearch to power a single search box over roughly 400,000 product rows. Eighteen months later they had a flaky CDC pipeline, a second on-call rotation, a $1,800/month cluster, and a recurring class of bugs where the search index disagreed with the database. Their actual query volume was about 12 searches per second at peak. Postgres on the box they already owned would have served that with a GIN index and never noticed.

That is the most common search mistake I see: reaching for a dedicated engine before checking whether the database you already run, back up, and monitor can do the job. Postgres full-text search (FTS) is not a toy. It ships in core, it's been hardened since version 8.3, and for a large fraction of "let users search our content" features it is the correct answer. Let me show you what it actually does, where it stops, and how to tell which side of that line you're on.

What Postgres FTS actually is

Two data types do the work. tsvector is a normalized, lexeme-sorted representation of a document. tsquery is a parsed search expression. The match operator @@ returns true when the query matches the vector.

SELECT to_tsvector('english', 'The cats are running quickly')
     @@ to_tsquery('english', 'cat & run');
-- t

Notice what happened. cats became the lexeme cat, running became run, and the stop words the and are were dropped entirely. That's the english text search configuration doing stemming and stop-word removal. This is the whole point of FTS over LIKE '%cat%': you match meaning, not substrings, so a search for run finds running, ran, and runs.

The configuration matters more than people expect. to_tsvector('simple', ...) does no stemming and keeps every token, which is what you want for usernames, SKUs, or tags. Pick the language config deliberately per column — it's baked into the stored lexemes, and changing it later means a full reindex.

A real searchable table

Don't compute to_tsvector at query time over a whole table. It can't use an index that way and you'll sequential-scan every row. Materialize it. Since Postgres 12 the clean way is a stored generated column.

CREATE TABLE articles (
  id          bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  title       text NOT NULL,
  body        text NOT NULL,
  tsv         tsvector GENERATED ALWAYS AS (
                  setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
                  setweight(to_tsvector('english', coalesce(body,  '')), 'B')
              ) STORED
);
 
CREATE INDEX articles_tsv_idx ON articles USING GIN (tsv);

Three things to internalize here.

setweight tags lexemes from title as weight A and from body as weight B. Postgres supports four weight labels (A, B, C, D) which later feed ranking — a title hit should count for more than a body hit, and this is how you express that.

The || concatenates the two weighted vectors into one column, so a single index covers both fields. You search one column; users feel like they searched everything.

The GIN index is what makes this fast. GIN stores a posting list per lexeme — effectively an inverted index living inside Postgres. That's the same data structure Lucene uses; the engine isn't magic, it's just a different host for the same idea.

On the generated column: it's recomputed automatically on every insert and update, transactionally, in the same write. There is no trigger to maintain, no tsvector_update_trigger boilerplate, and no window where the row and its index disagree. Before Postgres 12 you'd write a BEFORE INSERT OR UPDATE trigger to keep a plain tsvector column current; if you're on a modern version, prefer the generated column and delete the trigger.

Ranking results

Matching is binary. Ranking is what makes search feel good. ts_rank scores a vector against a query, and because we used setweight, it weights title matches above body matches.

SELECT id,
       title,
       ts_rank(tsv, websearch_to_tsquery('english', $1)) AS rank
FROM   articles
WHERE  tsv @@ websearch_to_tsquery('english', $1)
ORDER  BY rank DESC
LIMIT  20;

Use websearch_to_tsquery, not to_tsquery, for anything a human types. It accepts the Google-ish syntax users already know — quoted "exact phrases", or, and leading - to exclude — and it never throws a syntax error on weird input. to_tsquery('english', 'cat &') raises an error; websearch_to_tsquery('english', 'cat &') just searches for cat. That difference alone saves you a pile of input-sanitizing code.

If you want the default A/B/C/D weights changed, pass them explicitly. The fourth argument is a normalization flag — ts_rank(weights, tsv, query, 1) divides the rank by 1 + log(document length), which stops long documents from dominating purely because they contain more words. (Flag 32, by contrast, divides the rank by rank + 1 to scale it into the 0–1 range.)

ts_rank('{0.1, 0.2, 0.4, 1.0}', tsv, websearch_to_tsquery('english', $1), 1)

Run EXPLAIN (ANALYZE, BUFFERS) on that query. You want to see a Bitmap Index Scan on articles_tsv_idx feeding a Bitmap Heap Scan. If you see a Seq Scan, your index isn't being used — usually because you wrapped tsv in a function or the planner decided the table was too small to bother, which on a small table is fine.

Typo tolerance with pg_trgm

Stemming handles word forms. It does nothing for typos: search Postgers and FTS returns zero rows because there's no lexeme postger. The pg_trgm extension fills that gap with trigram similarity — it breaks strings into three-character chunks and measures overlap.

CREATE EXTENSION IF NOT EXISTS pg_trgm;
 
CREATE INDEX articles_title_trgm_idx
  ON articles USING GIN (title gin_trgm_ops);
 
SELECT id, title, similarity(title, $1) AS sim
FROM   articles
WHERE  title % $1            -- % is the similarity operator
ORDER  BY sim DESC
LIMIT  10;

The % operator returns true when similarity exceeds pg_trgm.similarity_threshold (default 0.3, tunable per session with SET pg_trgm.similarity_threshold). This is genuinely useful for short fields — titles, names, cities, product SKUs — and the GIN trigram index keeps it fast.

Be honest about its limits. Trigram similarity over long body text is slow and noisy, and it's a fuzzy-string tool, not a typo-correcting search ranker. A practical pattern is FTS first, and if it returns too few rows, fall back to a trigram query for "did you mean" suggestions. That covers most real typo needs without a dedicated engine.

Where Postgres FTS runs out of road

I run Postgres FTS in production happily. I also know its ceiling, and I've hit it. Here's where a dedicated engine starts earning its operational cost.

Relevance tuning. ts_rank is coarse. It has no notion of field boosts beyond four weight buckets, no per-query function scoring, no decay by recency, no learning-to-rank. If "relevance is the product" — a marketplace, a docs site where the right answer must be in the top three — engines like Elasticsearch and OpenSearch give you BM25 scoring, function score queries, and tunable analyzers that Postgres simply doesn't expose.

Real typo tolerance and instant search. Meilisearch and Typesense were built around forgiving, prefix-aware, sub-50ms typeahead with typo tolerance as a first-class feature, not a trigram bolt-on. If you're building Algolia-style search-as-you-type, they'll beat hand-rolled Postgres every time and take a day to set up.

Faceting and aggregations at scale. Faceted navigation — "show counts per brand, per price bucket, per color, filtered live" — is COUNT(*) ... GROUP BY in Postgres. That's fine at moderate scale and painful when facets multiply over millions of rows. Search engines maintain these aggregations natively.

Search analytics. Click-through, query logs, zero-result tracking, A/B testing ranking changes — the dedicated engines and their ecosystems ship this. In Postgres you build it yourself.

Scale and write amplification. GIN indexes are expensive to update. A write-heavy table with a large tsvector GIN index will feel index maintenance pressure — watch your fastupdate settings and gin_pending_list_limit. Past tens of millions of frequently-updated documents, offloading search to a system designed for it relieves your primary database.

Here's the comparison I actually use:

Need	Postgres FTS	Meilisearch / Typesense	Elasticsearch / OpenSearch
Setup & ops	Already running it	One service, simple	Cluster, JVM, real ops
Stemming / language	Built-in, many configs	Good	Excellent
Typo tolerance	`pg_trgm` (decent)	Excellent (core feature)	Good (configurable)
Relevance tuning	Coarse (`ts_rank`)	Good defaults	Deep (BM25, function score)
Faceting	`GROUP BY` (manual)	Built-in	Built-in, scales
Transactional consistency	Perfect (same DB)	Eventual (must sync)	Eventual (must sync)
Search analytics	DIY	Some	Rich ecosystem
Sweet spot	< ~10M docs, search is a feature	Instant typeahead UX	Search is the product, huge scale

The cost nobody budgets for

The decision isn't really "is the engine better at search." It usually is. The decision is whether you want a second system of record for the same data. That means a sync pipeline — logical replication, CDC via Debezium, or application-level dual writes — and every one of those introduces lag and a new failure mode where search results disagree with the database. It means a separate thing to back up, secure, upgrade, monitor, and wake someone up at 3am for.

When search lives inside Postgres, a row and its searchability commit in the same transaction. There is no drift, ever. That consistency is worth a lot, and it's the first thing you give up when you split search out.

A decision framework

Walk these in order. Stop at the first honest "yes that's us."

Is search a feature, or the product? If users will leave because results are slightly off, you're in dedicated-engine territory. If it's "let people find their stuff," start with Postgres.
Are you under ~10 million searchable rows with moderate write rates? Postgres FTS handles this on hardware you already pay for. Build the generated tsvector column, add the GIN index, ship it.
Do you need true typeahead with typo tolerance and sub-50ms latency? Reach for Typesense or Meilisearch. They're cheap to run and purpose-built.
Do you need deep relevance tuning, large-scale faceting, or search analytics? That's Elasticsearch or OpenSearch — and accept the operational cost honestly.
Default: Start with Postgres. Add the engine when you have a measured, named limitation — not a hypothetical one. Migrating later is a known, bounded project. Running an unnecessary distributed system from day one is a tax you pay forever.

The trap is treating "add a search engine" as the obvious first move. For most teams it's premature infrastructure. Build the GIN-indexed tsvector column, measure with EXPLAIN (ANALYZE, BUFFERS), and only move when the data tells you to. Postgres will carry you further than you think.