Vector Databases Explained for Practitioners: What You Actually Need to Know
If you’ve built or read about a RAG system in the past two years, you’ve encountered vector databases. They’re the infrastructure layer that makes semantic search possible — the thing that turns “find documents related to this question” from a keyword-matching exercise into something that actually understands meaning.
But the vector database landscape has gotten noisy. There are at least a dozen products competing for attention, each claiming to be the best. For practitioners trying to build working systems, cutting through the marketing to understand what these tools actually do — and which one fits your use case — is more valuable than any benchmark comparison.
What Vectors Are and Why They Matter
A vector, in this context, is a list of numbers that represents the meaning of a piece of text. When you pass a sentence through an embedding model (like OpenAI’s text-embedding-3-large or an open-source model like BGE), the output is a high-dimensional vector — typically 768 to 3072 numbers — that captures the semantic content.
Texts with similar meanings produce vectors that are close together in this high-dimensional space. “How do I train a machine learning model?” and “What’s the process for building an ML system?” produce vectors that are near each other, even though they share few exact words.
This is the core value proposition. You can search by meaning rather than by keywords. Traditional search (BM25, TF-IDF) matches tokens. Vector search matches concepts.
What a Vector Database Does
At its simplest, a vector database stores vectors and enables fast similarity searches. You give it a query vector, and it returns the vectors (and their associated documents) that are most similar.
The challenge is doing this at scale. A brute-force comparison of a query vector against every stored vector is O(n) — fine for thousands of documents, unacceptable for millions. Vector databases use approximate nearest neighbor (ANN) algorithms to make this search fast at scale.
The main ANN algorithms in use:
HNSW (Hierarchical Navigable Small World): The most common approach. Builds a multi-layer graph structure that allows fast traversal from any point to the nearest neighbors of a query. Good balance of speed and accuracy. Used by Qdrant, Weaviate, and pgvector.
IVF (Inverted File Index): Partitions the vector space into clusters and only searches relevant clusters. Faster to build than HNSW but slightly less accurate for out-of-distribution queries. Used by FAISS and some Milvus configurations.
ScaNN (Scalable Nearest Neighbors): Google’s approach, which combines partitioning with quantization for very large-scale deployments. Available through Google’s Vertex AI and the open-source ScaNN library.
For most practical applications, HNSW is the default choice. It’s well-understood, broadly supported, and provides excellent recall (typically 95%+ of true nearest neighbors) at sub-millisecond query times for millions of vectors.
Do You Actually Need a Dedicated Vector Database?
This is the question that doesn’t get asked enough. For many projects, the answer is no.
If your dataset is small (under 100,000 documents): A simple in-memory solution like FAISS or even NumPy-based cosine similarity is fast enough. You don’t need a server process, managed service, or dedicated infrastructure. Keep it simple.
If you’re already using PostgreSQL: The pgvector extension adds vector similarity search directly to Postgres. For moderate-scale applications (up to a few million vectors), this is often the pragmatic choice — you keep your existing database infrastructure and add vector search as a feature rather than introducing a new system.
If you need a dedicated solution: Once you’re dealing with tens of millions of vectors, need sub-millisecond latency at scale, or want advanced features like hybrid search (combining vector and keyword search), a purpose-built vector database earns its place.
Comparing the Major Options
Pinecone
Fully managed, cloud-hosted. You don’t run anything — just send vectors via API and query them. The simplest to get started with and the lowest operational burden. Pricing is based on storage and compute, which can get expensive at scale. Good for teams that don’t want to manage infrastructure and are building on cloud providers already.
Qdrant
Open-source with a managed cloud option. Written in Rust, so performance is strong. Supports rich filtering alongside vector search, which is important for production systems where you need to combine semantic similarity with metadata constraints (e.g., “find similar documents, but only from the last 6 months”).
Weaviate
Open-source, focused on developer experience. Has built-in support for generating embeddings (you can send text directly rather than pre-computing vectors). Supports hybrid search (combining BM25 keyword search with vector search) out of the box, which tends to improve retrieval quality in practice.
Milvus/Zilliz
Open-source (Milvus) with a managed option (Zilliz). Built for very large-scale deployments — billions of vectors. If your scale demands justify it, Milvus handles horizontal scaling well. For most projects, this is more infrastructure than you need.
ChromaDB
Open-source, designed for simplicity. It’s essentially an embedded database — runs in your application process, no separate server. Great for prototyping and small applications. Not designed for production scale, but for getting a RAG system running quickly, it’s the fastest path.
Practical Recommendations
Start with the simplest option that works. For a proof of concept, ChromaDB or FAISS in a notebook is fine. For a production system on moderate data, pgvector is underrated. For dedicated vector search at scale, Qdrant or Weaviate are both solid.
Don’t optimise prematurely. The embedding model you choose affects retrieval quality more than the database. A mediocre embedding model with the world’s best vector database will produce worse results than a good embedding model with a basic similarity search. Spend your optimization effort on embedding quality first.
Hybrid search matters. Pure vector search misses things that keyword search catches, and vice versa. Systems that combine both — typically using Reciprocal Rank Fusion to merge results — perform measurably better in production. Weaviate and some configurations of Qdrant support this natively.
Plan for filtering. Real applications almost always need filtered search — “find similar documents where category = X and date > Y.” Not all vector databases handle this equally well. Test your actual filter patterns during evaluation, not just pure vector similarity.
Working with practical AI consulting teams on RAG deployments, the pattern we see consistently is that teams over-invest in the vector database selection and under-invest in chunking strategy and embedding quality. The database is important infrastructure, but it’s rarely the bottleneck. The quality of what goes into it matters more than the mechanics of how it’s stored.
What’s Coming Next
The vector database space is consolidating. Traditional databases are adding vector search (Postgres, MongoDB, Elasticsearch all support it now). Purpose-built vector databases are adding traditional database features. In a few years, “vector database” may not be a distinct category — it’ll just be a feature that every database supports.
For practitioners building today, this means the safest long-term bet is probably extending your existing database with vector capabilities rather than introducing a new system. Unless your scale or performance requirements genuinely demand a dedicated solution, pgvector or Elasticsearch’s vector search gets you 90% of the way there with 10% of the operational complexity.