All posts
AI Engineering··9 min read

Choosing a Vector Database in 2026: pgvector, Pinecone, Qdrant, and When Postgres Is Enough

You probably don't need a dedicated vector database. A decision framework for pgvector vs Pinecone/Qdrant, index types, and the scale where Postgres stops being enough.

By

On this page

Every RAG tutorial reaches for a managed vector database in the first ten minutes, and most of them are wrong to do it. I've shipped semantic search and retrieval-augmented generation into production three times now, and in two of those the entire vector layer was a single column in a Postgres database I was already running. The third one genuinely needed Qdrant — and I'll tell you exactly why, because the difference is what this post is about.

The default question shouldn't be "which vector database?" It should be "do I need one at all?" For a surprising number of teams the answer is no, and reaching for Pinecone on day one buys you a second datastore to operate, a second consistency problem, and a network hop between your rows and your embeddings — all before you've validated that anyone wants the feature.

The "just use Postgres" default

If you already run Postgres, your starting position is pgvector. It's a Postgres extension that adds a vector type plus approximate-nearest-neighbour indexing, and as of pgvector 0.8.x it supports HNSW and IVFFlat indexes, half-precision (halfvec) storage, binary quantization, and iterative index scans that fix the old filtering problems. It is not a toy. People run it at tens of millions of vectors.

The reason it's the right default is not that it's the fastest — it isn't — it's that it collapses three systems into one. Your relational data, your transactional guarantees, and your embeddings live in the same place. A retrieval query can join vectors against users, documents, tenants, and permissions in one round trip, inside one transaction, with one backup strategy and one access-control model. That single property eliminates an entire class of bugs: the embedding that exists but the row that was deleted, the tenant filter that was enforced in Postgres but not in the vector store, the eventual-consistency window where search returns a document the user just unshared.

Here's the whole setup. Extension, table, index, query.

-- Requires Postgres 14+ and pgvector 0.8.x installed on the host
CREATE EXTENSION IF NOT EXISTS vector;
 
CREATE TABLE documents (
  id          bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  tenant_id   bigint      NOT NULL,
  title       text        NOT NULL,
  body        text        NOT NULL,
  -- text-embedding-3-small is 1536 dims; store as halfvec to halve memory
  embedding   halfvec(1536) NOT NULL,
  created_at  timestamptz NOT NULL DEFAULT now()
);
 
-- HNSW index for cosine distance. Build params trade build time for recall.
CREATE INDEX ON documents
  USING hnsw (embedding halfvec_cosine_ops)
  WITH (m = 16, ef_construction = 64);
 
-- Helps the planner filter by tenant cheaply alongside the vector scan
CREATE INDEX ON documents (tenant_id);

And the query. Note the metadata filter living right next to the vector search — no application-side join, no second system.

-- Tune recall vs latency per query; higher ef_search = better recall, slower
SET hnsw.ef_search = 100;
 
SELECT id, title,
       embedding <=> $1 AS distance      -- <=> is cosine distance
FROM documents
WHERE tenant_id = $2
  AND created_at > now() - interval '90 days'
ORDER BY embedding <=> $1
LIMIT 10;

The <=> operator is cosine distance; pgvector also gives you <-> (L2) and <#> (negative inner product). Match the operator class on the index to the operator in the query or the planner won't use the index — that's the single most common pgvector mistake I see. If you build halfvec_cosine_ops you must query with <=>.

HNSW vs IVFFlat: the only index decision that matters

Both index types do approximate nearest neighbour, and they fail in opposite directions.

IVFFlat partitions vectors into lists clusters and only searches the nearest few (probes) at query time. It builds fast and uses little memory, but recall depends entirely on how well your clusters were trained — and you have to build it after you have representative data loaded, or the centroids are garbage. If your data distribution shifts, recall quietly degrades until you rebuild.

HNSW builds a multi-layer navigable graph. It's slower to build, uses more memory, and is the right default in 2026 for almost everyone because its recall is high and stable, it doesn't need training data, and you can add rows incrementally without recall falling off a cliff. The cost is RAM: the graph wants to live in memory, and m = 16 with 1536-dim vectors is on the order of a few KB per vector of index overhead on top of the vectors themselves.

The three knobs you actually tune:

KnobIndexEffectTrade
mHNSWgraph connectivityhigher = better recall, more memory + slower build
ef_constructionHNSWbuild-time search widthhigher = better recall, slower build
ef_searchHNSWquery-time search widthhigher = better recall, slower queries
listsIVFFlatnumber of clusters~rows/1000 to start; rebuild when data grows
probesIVFFlatclusters searchedhigher = better recall, slower queries

My default for HNSW is m = 16, ef_construction = 64, and then tune ef_search at runtime against a labelled query set until recall@10 sits above 0.95. Don't guess these — measure recall against an exact (brute-force) baseline on a sample, because "the results look fine" is how you ship 0.7 recall and never find out.

Where Postgres stops being enough

I am not religious about this. There are real walls, and you should move before you hit them, not after.

Scale. Past roughly 50–100M vectors in a single table, HNSW index builds get painful, the index no longer fits comfortably in RAM on a sane instance, and you start fighting maintenance_work_mem and build times measured in hours. pgvector has no native sharding for the index, so horizontal scale means application-level partitioning or something like Citus — at which point a purpose-built store is usually less work.

Filtered search at scale. This is the subtle one. When you filter by metadata and search by vector, the ANN index and the relational filter fight each other. pgvector 0.8 added iterative index scans that handle this far better than older versions, but a dedicated store with first-class filterable payloads (Qdrant, Pinecone, Weaviate, Milvus) still wins decisively when filters are highly selective over huge collections. If most of your queries are "find me similar docs but only in this 0.1% slice," that's a signal.

Operational ergonomics. A managed store gives you replication, quantization, and zero-downtime reindexing as product features rather than DBA projects. If you have no one who wants to own a Postgres tuned for vector workloads, paying Pinecone to do it is a legitimate engineering decision.

Here's the honest comparison.

pgvectorQdrantPinecone
Ops modelyour Postgresself-host or managedfully managed only
Best scaleup to ~50–100M100M–1B+100M–1B+
Metadata filteringgood (0.8 iterative scans)excellentexcellent
Hybrid (dense+sparse)manual (FTS + vector)nativenative (sparse-dense)
Transactional w/ your datayes, same DBnono
Quantizationbinary, halfvecscalar/binary/productmanaged
You operateone systemtwo systemstwo systems

A dedicated store, concretely

When you do cross the line, the integration is not exotic. Here's Qdrant from a TypeScript service — create a collection with the right distance metric, upsert points with a payload, then do a filtered vector search.

import { QdrantClient } from "@qdrant/js-client-rest";
 
const qdrant = new QdrantClient({ url: process.env.QDRANT_URL });
 
await qdrant.createCollection("documents", {
  vectors: { size: 1536, distance: "Cosine" },
  // on_disk vectors keep RAM down; quantization trades recall for memory
  quantization_config: { scalar: { type: "int8", always_ram: true } },
});
 
await qdrant.upsert("documents", {
  wait: true,
  points: [
    {
      id: 1,
      vector: embedding, // number[] of length 1536
      payload: { tenant_id: 42, title: "Onboarding guide", days_old: 12 },
    },
  ],
});
 
const results = await qdrant.search("documents", {
  vector: queryEmbedding,
  limit: 10,
  // payload filter runs as a first-class part of the ANN search
  filter: {
    must: [
      { key: "tenant_id", match: { value: 42 } },
      { key: "days_old", range: { lt: 90 } },
    ],
  },
});

The shape of the code is the same as the SQL — vector plus filter plus limit. What you're paying for is that Qdrant's filter is applied during graph traversal rather than as a post-filter, so selective filters over a billion points stay fast. That, and you're now running and monitoring a second stateful service.

Hybrid search, briefly

Pure vector search misses exact-match and keyword intent — product SKUs, error codes, proper nouns. Hybrid search fuses dense (semantic) and sparse (lexical, BM25-style) retrieval. In pgvector you do this by hand: run a full-text tsvector query and a vector query, then combine with Reciprocal Rank Fusion in SQL or application code. It works and I've shipped it, but it's manual. Qdrant, Weaviate, and Pinecone offer native sparse-dense hybrid, which is one of the better reasons to adopt them — if your retrieval quality genuinely depends on lexical signal, that's worth more than raw scale.

The decision framework

Walk it top to bottom and stop at the first "yes."

  1. Do you already run Postgres and have fewer than ~10M vectors? Use pgvector with an HNSW index. Don't add a system. This is most teams building their first RAG feature.
  2. 10M–50M vectors, mostly unfiltered or lightly filtered search? Still pgvector. Move vectors to halfvec, tune m/ef_search, give it RAM, measure recall.
  3. Highly selective metadata filters over large collections, or you need native hybrid search? Move to Qdrant or Weaviate. The filtering and fusion are first-class there and painful to retrofit.
  4. 100M+ vectors, or you have no one to operate a tuned Postgres? A managed store (Pinecone, or managed Qdrant/Weaviate) earns its cost. Pay for the ops you can't staff.
  5. Whatever you choose, build an exact-search recall baseline first. You cannot tune what you don't measure, and ANN that silently drops to 0.7 recall will quietly wreck your RAG quality.

The thread running through all of this is the same one from how you'd think about indexing any Postgres table: an index is a recall/latency/memory trade you tune against real queries, not a magic switch. Vector indexes are just that trade in higher dimensions. Start with the database you already have, prove the feature is worth keeping, and let measured recall and real scale — not a tutorial's defaults — tell you when to graduate.

Further reading

  • pgvector — github.com/pgvector/pgvector (README covers HNSW/IVFFlat tuning and halfvec)
  • Qdrant documentation — qdrant.tech/documentation
  • Pinecone documentation — docs.pinecone.io
  • OpenAI embeddings guide — platform.openai.com/docs (dimensions and model choices for text-embedding-3-*)