Vector Databases in 2026: Trends and Tooling for AI
notes
Vector databases went from niche ML infrastructure to mainstream developer tool in about eighteen months, driven entirely by the LLM boom. If you are building anything that involves semantic search, recommendation, or retrieval-augmented generation (RAG), you need to store and query vector embeddings. The question is whether you need a dedicated vector database or whether your existing database can handle it.
This note covers the landscape, the tradeoffs, and the decision framework.
What Vector Databases Do
A vector database stores high-dimensional vectors (arrays of floating point numbers, typically 768-1536 dimensions) and supports efficient nearest-neighbor queries. Given a query vector, find the k vectors in the database that are most similar.
The similarity metric is usually cosine similarity, Euclidean distance, or dot product. The challenge is doing this efficiently at scale — brute-force comparison is O(n) per query, which does not work when n is millions or billions.
Vector databases solve this with approximate nearest neighbor (ANN) algorithms: HNSW (hierarchical navigable small world), IVF (inverted file index), and product quantization. These trade a small amount of accuracy for dramatic speed improvements — query times of single-digit milliseconds even on datasets with millions of vectors.
The Major Options in 2026
Pinecone. Fully managed, serverless pricing, strong developer experience. Best for teams that want zero operational overhead and are willing to pay for the convenience. Pricing is per-vector-stored and per-query, which gets expensive at high volume.
Weaviate. Open source with a managed cloud option. Supports hybrid search (vector + keyword) out of the box. Good Python and JavaScript clients. The open-source option means you can self-host to control costs.
Qdrant. Open source, written in Rust, excellent performance characteristics. Strong filtering support (filter by metadata before or during vector search). Good for workloads that need both structured filtering and vector similarity.
Milvus. Open source, designed for very large-scale deployments. Supports billions of vectors with distributed architecture. More complex to operate than Pinecone or Qdrant but handles larger datasets.
Chroma. Open source, developer-friendly, optimized for prototyping and smaller workloads. Embeds directly into Python applications. Good for getting started, but scaling to production requires more work.
pgvector (PostgreSQL extension). Adds vector columns and similarity search to PostgreSQL. No new infrastructure. Not as fast as dedicated vector databases at large scale, but good enough for datasets under a few million vectors.
When You Need a Dedicated Vector Database
You need a dedicated vector database when:
- Your dataset exceeds 5-10 million vectors and query latency matters
- You need to update vectors frequently (streaming embeddings, real-time indexing)
- Your query patterns require complex filtering combined with similarity search
- You need multi-tenancy with strict isolation between vector stores
When pgvector Is Enough
For most applications, pgvector covers the need:
- RAG applications with fewer than a few million documents
- Semantic search over product catalogs, help articles, or internal knowledge bases
- Prototyping and early-stage products where adding infrastructure is premature
- Applications that already use PostgreSQL and benefit from keeping data in one place
The pgvector advantage is operational simplicity. One database, one backup strategy, one monitoring system. When you store embeddings alongside the source data in PostgreSQL, JOINs between vectors and structured data are trivial. In a dedicated vector database, that JOIN requires application-level orchestration.
-- pgvector: store and query embeddings alongside structured data
CREATE TABLE articles (
id SERIAL PRIMARY KEY,
title TEXT,
content TEXT,
embedding vector(1536)
);
CREATE INDEX ON articles USING hnsw (embedding vector_cosine_ops);
-- Find similar articles
SELECT id, title, 1 - (embedding <=> $1) AS similarity
FROM articles
ORDER BY embedding <=> $1
LIMIT 10;
The RAG Architecture Pattern
The dominant use case for vector databases in 2026 is retrieval-augmented generation (RAG): an LLM generates responses using context retrieved from a vector store. The architecture:
- Ingest: split documents into chunks, generate embeddings with an embedding model, store in vector database
- Query: embed the user’s question, search the vector database for similar chunks, pass those chunks as context to the LLM
- Generate: the LLM produces an answer grounded in the retrieved context
The vector database’s job in this pipeline is step 2 — fast, accurate similarity search over the document embeddings. Quality here is measured by recall (did you find the relevant chunks?) and latency (how fast?).
Common mistakes in RAG architectures:
- Chunk size too large. Large chunks include irrelevant context that dilutes the LLM’s response. 200-500 tokens per chunk is a common starting point.
- Ignoring metadata filtering. If you know the user is asking about a specific product or time period, filter by metadata before searching by similarity. This improves both relevance and speed.
- Not evaluating retrieval quality separately from generation quality. If the LLM gives bad answers, the problem might be retrieval (wrong chunks) or generation (wrong interpretation). Evaluate each step independently.
Cost Comparison
Rough cost comparison for storing 1 million 1536-dimensional vectors:
| Service | Monthly Cost (approx.) |
|---|---|
| Pinecone (serverless) | $50-100 |
| Weaviate Cloud | $25-75 |
| Qdrant Cloud | $30-65 |
| pgvector (self-managed) | Included in PostgreSQL hosting |
The managed vector database services charge for storage and queries. pgvector’s cost is whatever you are already paying for PostgreSQL. For teams already running PostgreSQL, the marginal cost of adding vector search is effectively zero.
The Recommendation
Start with pgvector if you already use PostgreSQL and your dataset is under a few million vectors. Move to a dedicated vector database when pgvector becomes the performance bottleneck — not before. The blob storage patterns principle applies here too: start simple, measure, and add complexity only when the measurements justify it.
If you know from the start that your dataset will be large (tens of millions of vectors or more), start with a dedicated vector database to avoid a migration later. Qdrant and Weaviate offer the best balance of performance, features, and operational flexibility for most production workloads.