Choosing a Vector Database in 2026: pgvector, Pinecone, Qdrant, and When Postgres Is Enough
You probably don't need a dedicated vector database. A decision framework for pgvector vs Pinecone/Qdrant, index types, and the scale where Postgres stops being enough.
On this page
Every RAG tutorial reaches for a managed vector database in the first ten minutes, and most of them are wrong to do it. I've shipped semantic search and retrieval-augmented generation into production three times now, and in two of those the entire vector layer was a single column in a Postgres database I was already running. The third one genuinely needed Qdrant — and I'll tell you exactly why, because the difference is what this post is about.
The default question shouldn't be "which vector database?" It should be "do I need one at all?" For a surprising number of teams the answer is no, and reaching for Pinecone on day one buys you a second datastore to operate, a second consistency problem, and a network hop between your rows and your embeddings — all before you've validated that anyone wants the feature.
The "just use Postgres" default
If you already run Postgres, your starting position is pgvector. It's a Postgres extension that adds a vector type plus approximate-nearest-neighbour indexing, and as of pgvector 0.8.x it supports HNSW and IVFFlat indexes, half-precision (halfvec) storage, binary quantization, and iterative index scans that fix the old filtering problems. It is not a toy. People run it at tens of millions of vectors.
The reason it's the right default is not that it's the fastest — it isn't — it's that it collapses three systems into one. Your relational data, your transactional guarantees, and your embeddings live in the same place. A retrieval query can join vectors against users, documents, tenants, and permissions in one round trip, inside one transaction, with one backup strategy and one access-control model. That single property eliminates an entire class of bugs: the embedding that exists but the row that was deleted, the tenant filter that was enforced in Postgres but not in the vector store, the eventual-consistency window where search returns a document the user just unshared.
Here's the whole setup. Extension, table, index, query.
-- Requires Postgres 14+ and pgvector 0.8.x installed on the host
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
tenant_id bigint NOT NULL,
title text NOT NULL,
body text NOT NULL,
-- text-embedding-3-small is 1536 dims; store as halfvec to halve memory
embedding halfvec(1536) NOT NULL,
created_at timestamptz NOT NULL DEFAULT now()
);
-- HNSW index for cosine distance. Build params trade build time for recall.
CREATE INDEX ON documents
USING hnsw (embedding halfvec_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Helps the planner filter by tenant cheaply alongside the vector scan
CREATE INDEX ON documents (tenant_id);And the query. Note the metadata filter living right next to the vector search — no application-side join, no second system.
-- Tune recall vs latency per query; higher ef_search = better recall, slower
SET hnsw.ef_search = 100;
SELECT id, title,
embedding <=> $1 AS distance -- <=> is cosine distance
FROM documents
WHERE tenant_id = $2
AND created_at > now() - interval '90 days'
ORDER BY embedding <=> $1
LIMIT 10;The <=> operator is cosine distance; pgvector also gives you <-> (L2) and <#> (negative inner product). Match the operator class on the index to the operator in the query or the planner won't use the index — that's the single most common pgvector mistake I see. If you build halfvec_cosine_ops you must query with <=>.
HNSW vs IVFFlat: the only index decision that matters
Both index types do approximate nearest neighbour, and they fail in opposite directions.
IVFFlat partitions vectors into lists clusters and only searches the nearest few (probes) at query time. It builds fast and uses little memory, but recall depends entirely on how well your clusters were trained — and you have to build it after you have representative data loaded, or the centroids are garbage. If your data distribution shifts, recall quietly degrades until you rebuild.
HNSW builds a multi-layer navigable graph. It's slower to build, uses more memory, and is the right default in 2026 for almost everyone because its recall is high and stable, it doesn't need training data, and you can add rows incrementally without recall falling off a cliff. The cost is RAM: the graph wants to live in memory, and m = 16 with 1536-dim vectors is on the order of a few KB per vector of index overhead on top of the vectors themselves.
The three knobs you actually tune:
| Knob | Index | Effect | Trade |
|---|---|---|---|
m | HNSW | graph connectivity | higher = better recall, more memory + slower build |
ef_construction | HNSW | build-time search width | higher = better recall, slower build |
ef_search | HNSW | query-time search width | higher = better recall, slower queries |
lists | IVFFlat | number of clusters | ~rows/1000 to start; rebuild when data grows |
probes | IVFFlat | clusters searched | higher = better recall, slower queries |
My default for HNSW is m = 16, ef_construction = 64, and then tune ef_search at runtime against a labelled query set until recall@10 sits above 0.95. Don't guess these — measure recall against an exact (brute-force) baseline on a sample, because "the results look fine" is how you ship 0.7 recall and never find out.
Where Postgres stops being enough
I am not religious about this. There are real walls, and you should move before you hit them, not after.
Scale. Past roughly 50–100M vectors in a single table, HNSW index builds get painful, the index no longer fits comfortably in RAM on a sane instance, and you start fighting maintenance_work_mem and build times measured in hours. pgvector has no native sharding for the index, so horizontal scale means application-level partitioning or something like Citus — at which point a purpose-built store is usually less work.
Filtered search at scale. This is the subtle one. When you filter by metadata and search by vector, the ANN index and the relational filter fight each other. pgvector 0.8 added iterative index scans that handle this far better than older versions, but a dedicated store with first-class filterable payloads (Qdrant, Pinecone, Weaviate, Milvus) still wins decisively when filters are highly selective over huge collections. If most of your queries are "find me similar docs but only in this 0.1% slice," that's a signal.
Operational ergonomics. A managed store gives you replication, quantization, and zero-downtime reindexing as product features rather than DBA projects. If you have no one who wants to own a Postgres tuned for vector workloads, paying Pinecone to do it is a legitimate engineering decision.
Here's the honest comparison.
| pgvector | Qdrant | Pinecone | |
|---|---|---|---|
| Ops model | your Postgres | self-host or managed | fully managed only |
| Best scale | up to ~50–100M | 100M–1B+ | 100M–1B+ |
| Metadata filtering | good (0.8 iterative scans) | excellent | excellent |
| Hybrid (dense+sparse) | manual (FTS + vector) | native | native (sparse-dense) |
| Transactional w/ your data | yes, same DB | no | no |
| Quantization | binary, halfvec | scalar/binary/product | managed |
| You operate | one system | two systems | two systems |
A dedicated store, concretely
When you do cross the line, the integration is not exotic. Here's Qdrant from a TypeScript service — create a collection with the right distance metric, upsert points with a payload, then do a filtered vector search.
import { QdrantClient } from "@qdrant/js-client-rest";
const qdrant = new QdrantClient({ url: process.env.QDRANT_URL });
await qdrant.createCollection("documents", {
vectors: { size: 1536, distance: "Cosine" },
// on_disk vectors keep RAM down; quantization trades recall for memory
quantization_config: { scalar: { type: "int8", always_ram: true } },
});
await qdrant.upsert("documents", {
wait: true,
points: [
{
id: 1,
vector: embedding, // number[] of length 1536
payload: { tenant_id: 42, title: "Onboarding guide", days_old: 12 },
},
],
});
const results = await qdrant.search("documents", {
vector: queryEmbedding,
limit: 10,
// payload filter runs as a first-class part of the ANN search
filter: {
must: [
{ key: "tenant_id", match: { value: 42 } },
{ key: "days_old", range: { lt: 90 } },
],
},
});The shape of the code is the same as the SQL — vector plus filter plus limit. What you're paying for is that Qdrant's filter is applied during graph traversal rather than as a post-filter, so selective filters over a billion points stay fast. That, and you're now running and monitoring a second stateful service.
Hybrid search, briefly
Pure vector search misses exact-match and keyword intent — product SKUs, error codes, proper nouns. Hybrid search fuses dense (semantic) and sparse (lexical, BM25-style) retrieval. In pgvector you do this by hand: run a full-text tsvector query and a vector query, then combine with Reciprocal Rank Fusion in SQL or application code. It works and I've shipped it, but it's manual. Qdrant, Weaviate, and Pinecone offer native sparse-dense hybrid, which is one of the better reasons to adopt them — if your retrieval quality genuinely depends on lexical signal, that's worth more than raw scale.
The decision framework
Walk it top to bottom and stop at the first "yes."
- Do you already run Postgres and have fewer than ~10M vectors? Use pgvector with an HNSW index. Don't add a system. This is most teams building their first RAG feature.
- 10M–50M vectors, mostly unfiltered or lightly filtered search? Still pgvector. Move vectors to
halfvec, tunem/ef_search, give it RAM, measure recall. - Highly selective metadata filters over large collections, or you need native hybrid search? Move to Qdrant or Weaviate. The filtering and fusion are first-class there and painful to retrofit.
- 100M+ vectors, or you have no one to operate a tuned Postgres? A managed store (Pinecone, or managed Qdrant/Weaviate) earns its cost. Pay for the ops you can't staff.
- Whatever you choose, build an exact-search recall baseline first. You cannot tune what you don't measure, and ANN that silently drops to 0.7 recall will quietly wreck your RAG quality.
The thread running through all of this is the same one from how you'd think about indexing any Postgres table: an index is a recall/latency/memory trade you tune against real queries, not a magic switch. Vector indexes are just that trade in higher dimensions. Start with the database you already have, prove the feature is worth keeping, and let measured recall and real scale — not a tutorial's defaults — tell you when to graduate.
Further reading
- pgvector — github.com/pgvector/pgvector (README covers HNSW/IVFFlat tuning and
halfvec) - Qdrant documentation — qdrant.tech/documentation
- Pinecone documentation — docs.pinecone.io
- OpenAI embeddings guide — platform.openai.com/docs (dimensions and model choices for
text-embedding-3-*)