Postgres Full-Text Search vs a Dedicated Search Engine
Before you add Elasticsearch, check whether Postgres already does the job. Full-text search in Postgres, where it shines, and the signals it's time to move on.
On this page
A team I worked with last year added Elasticsearch to power a single search box over roughly 400,000 product rows. Eighteen months later they had a flaky CDC pipeline, a second on-call rotation, a $1,800/month cluster, and a recurring class of bugs where the search index disagreed with the database. Their actual query volume was about 12 searches per second at peak. Postgres on the box they already owned would have served that with a GIN index and never noticed.
That is the most common search mistake I see: reaching for a dedicated engine before checking whether the database you already run, back up, and monitor can do the job. Postgres full-text search (FTS) is not a toy. It ships in core, it's been hardened since version 8.3, and for a large fraction of "let users search our content" features it is the correct answer. Let me show you what it actually does, where it stops, and how to tell which side of that line you're on.
What Postgres FTS actually is
Two data types do the work. tsvector is a normalized, lexeme-sorted representation of a document. tsquery is a parsed search expression. The match operator @@ returns true when the query matches the vector.
SELECT to_tsvector('english', 'The cats are running quickly')
@@ to_tsquery('english', 'cat & run');
-- tNotice what happened. cats became the lexeme cat, running became run, and the stop words the and are were dropped entirely. That's the english text search configuration doing stemming and stop-word removal. This is the whole point of FTS over LIKE '%cat%': you match meaning, not substrings, so a search for run finds running, ran, and runs.
The configuration matters more than people expect. to_tsvector('simple', ...) does no stemming and keeps every token, which is what you want for usernames, SKUs, or tags. Pick the language config deliberately per column — it's baked into the stored lexemes, and changing it later means a full reindex.
A real searchable table
Don't compute to_tsvector at query time over a whole table. It can't use an index that way and you'll sequential-scan every row. Materialize it. Since Postgres 12 the clean way is a stored generated column.
CREATE TABLE articles (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
title text NOT NULL,
body text NOT NULL,
tsv tsvector GENERATED ALWAYS AS (
setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
setweight(to_tsvector('english', coalesce(body, '')), 'B')
) STORED
);
CREATE INDEX articles_tsv_idx ON articles USING GIN (tsv);Three things to internalize here.
setweight tags lexemes from title as weight A and from body as weight B. Postgres supports four weight labels (A, B, C, D) which later feed ranking — a title hit should count for more than a body hit, and this is how you express that.
The || concatenates the two weighted vectors into one column, so a single index covers both fields. You search one column; users feel like they searched everything.
The GIN index is what makes this fast. GIN stores a posting list per lexeme — effectively an inverted index living inside Postgres. That's the same data structure Lucene uses; the engine isn't magic, it's just a different host for the same idea.
On the generated column: it's recomputed automatically on every insert and update, transactionally, in the same write. There is no trigger to maintain, no tsvector_update_trigger boilerplate, and no window where the row and its index disagree. Before Postgres 12 you'd write a BEFORE INSERT OR UPDATE trigger to keep a plain tsvector column current; if you're on a modern version, prefer the generated column and delete the trigger.
Ranking results
Matching is binary. Ranking is what makes search feel good. ts_rank scores a vector against a query, and because we used setweight, it weights title matches above body matches.
SELECT id,
title,
ts_rank(tsv, websearch_to_tsquery('english', $1)) AS rank
FROM articles
WHERE tsv @@ websearch_to_tsquery('english', $1)
ORDER BY rank DESC
LIMIT 20;Use websearch_to_tsquery, not to_tsquery, for anything a human types. It accepts the Google-ish syntax users already know — quoted "exact phrases", or, and leading - to exclude — and it never throws a syntax error on weird input. to_tsquery('english', 'cat &') raises an error; websearch_to_tsquery('english', 'cat &') just searches for cat. That difference alone saves you a pile of input-sanitizing code.
If you want the default A/B/C/D weights changed, pass them explicitly. The fourth argument is a normalization flag — ts_rank(weights, tsv, query, 1) divides the rank by 1 + log(document length), which stops long documents from dominating purely because they contain more words. (Flag 32, by contrast, divides the rank by rank + 1 to scale it into the 0–1 range.)
ts_rank('{0.1, 0.2, 0.4, 1.0}', tsv, websearch_to_tsquery('english', $1), 1)Run EXPLAIN (ANALYZE, BUFFERS) on that query. You want to see a Bitmap Index Scan on articles_tsv_idx feeding a Bitmap Heap Scan. If you see a Seq Scan, your index isn't being used — usually because you wrapped tsv in a function or the planner decided the table was too small to bother, which on a small table is fine.
Typo tolerance with pg_trgm
Stemming handles word forms. It does nothing for typos: search Postgers and FTS returns zero rows because there's no lexeme postger. The pg_trgm extension fills that gap with trigram similarity — it breaks strings into three-character chunks and measures overlap.
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX articles_title_trgm_idx
ON articles USING GIN (title gin_trgm_ops);
SELECT id, title, similarity(title, $1) AS sim
FROM articles
WHERE title % $1 -- % is the similarity operator
ORDER BY sim DESC
LIMIT 10;The % operator returns true when similarity exceeds pg_trgm.similarity_threshold (default 0.3, tunable per session with SET pg_trgm.similarity_threshold). This is genuinely useful for short fields — titles, names, cities, product SKUs — and the GIN trigram index keeps it fast.
Be honest about its limits. Trigram similarity over long body text is slow and noisy, and it's a fuzzy-string tool, not a typo-correcting search ranker. A practical pattern is FTS first, and if it returns too few rows, fall back to a trigram query for "did you mean" suggestions. That covers most real typo needs without a dedicated engine.
Where Postgres FTS runs out of road
I run Postgres FTS in production happily. I also know its ceiling, and I've hit it. Here's where a dedicated engine starts earning its operational cost.
Relevance tuning. ts_rank is coarse. It has no notion of field boosts beyond four weight buckets, no per-query function scoring, no decay by recency, no learning-to-rank. If "relevance is the product" — a marketplace, a docs site where the right answer must be in the top three — engines like Elasticsearch and OpenSearch give you BM25 scoring, function score queries, and tunable analyzers that Postgres simply doesn't expose.
Real typo tolerance and instant search. Meilisearch and Typesense were built around forgiving, prefix-aware, sub-50ms typeahead with typo tolerance as a first-class feature, not a trigram bolt-on. If you're building Algolia-style search-as-you-type, they'll beat hand-rolled Postgres every time and take a day to set up.
Faceting and aggregations at scale. Faceted navigation — "show counts per brand, per price bucket, per color, filtered live" — is COUNT(*) ... GROUP BY in Postgres. That's fine at moderate scale and painful when facets multiply over millions of rows. Search engines maintain these aggregations natively.
Search analytics. Click-through, query logs, zero-result tracking, A/B testing ranking changes — the dedicated engines and their ecosystems ship this. In Postgres you build it yourself.
Scale and write amplification. GIN indexes are expensive to update. A write-heavy table with a large tsvector GIN index will feel index maintenance pressure — watch your fastupdate settings and gin_pending_list_limit. Past tens of millions of frequently-updated documents, offloading search to a system designed for it relieves your primary database.
Here's the comparison I actually use:
| Need | Postgres FTS | Meilisearch / Typesense | Elasticsearch / OpenSearch |
|---|---|---|---|
| Setup & ops | Already running it | One service, simple | Cluster, JVM, real ops |
| Stemming / language | Built-in, many configs | Good | Excellent |
| Typo tolerance | pg_trgm (decent) | Excellent (core feature) | Good (configurable) |
| Relevance tuning | Coarse (ts_rank) | Good defaults | Deep (BM25, function score) |
| Faceting | GROUP BY (manual) | Built-in | Built-in, scales |
| Transactional consistency | Perfect (same DB) | Eventual (must sync) | Eventual (must sync) |
| Search analytics | DIY | Some | Rich ecosystem |
| Sweet spot | < ~10M docs, search is a feature | Instant typeahead UX | Search is the product, huge scale |
The cost nobody budgets for
The decision isn't really "is the engine better at search." It usually is. The decision is whether you want a second system of record for the same data. That means a sync pipeline — logical replication, CDC via Debezium, or application-level dual writes — and every one of those introduces lag and a new failure mode where search results disagree with the database. It means a separate thing to back up, secure, upgrade, monitor, and wake someone up at 3am for.
When search lives inside Postgres, a row and its searchability commit in the same transaction. There is no drift, ever. That consistency is worth a lot, and it's the first thing you give up when you split search out.
A decision framework
Walk these in order. Stop at the first honest "yes that's us."
- Is search a feature, or the product? If users will leave because results are slightly off, you're in dedicated-engine territory. If it's "let people find their stuff," start with Postgres.
- Are you under ~10 million searchable rows with moderate write rates? Postgres FTS handles this on hardware you already pay for. Build the generated
tsvectorcolumn, add the GIN index, ship it. - Do you need true typeahead with typo tolerance and sub-50ms latency? Reach for Typesense or Meilisearch. They're cheap to run and purpose-built.
- Do you need deep relevance tuning, large-scale faceting, or search analytics? That's Elasticsearch or OpenSearch — and accept the operational cost honestly.
- Default: Start with Postgres. Add the engine when you have a measured, named limitation — not a hypothetical one. Migrating later is a known, bounded project. Running an unnecessary distributed system from day one is a tax you pay forever.
The trap is treating "add a search engine" as the obvious first move. For most teams it's premature infrastructure. Build the GIN-indexed tsvector column, measure with EXPLAIN (ANALYZE, BUFFERS), and only move when the data tells you to. Postgres will carry you further than you think.
Further reading
- PostgreSQL documentation, "Full Text Search" chapter (the
textsearchdocs) and thepg_trgmextension — postgresql.org/docs - Meilisearch and Typesense official documentation for instant-search use cases
- Elasticsearch / OpenSearch documentation for BM25 and function score queries