Vector Databases: When You Actually Need One


Vector databases have become a standard component of the “modern AI stack.” Every RAG (Retrieval-Augmented Generation) tutorial includes setting up Pinecone, Weaviate, or Qdrant. Venture funding flows to vector database companies, and architectural diagrams feature them prominently.

For certain use cases, vector databases are genuinely valuable. For many others, they’re unnecessary complexity that simpler solutions handle better.

Here’s an honest assessment of when you need a vector database and when you don’t.

What Vector Databases Do

A vector database stores high-dimensional vectors (typically embeddings generated by machine learning models) and supports efficient similarity search — finding the vectors most similar to a query vector.

In AI applications, text, images, and other data are converted to embeddings (vectors representing semantic meaning). When a user queries the system, their query is embedded, and the vector database finds the most semantically similar stored items.

This enables semantic search (finding documents that mean something similar, not just keyword matches), recommendation systems (finding items similar to what a user likes), and RAG (retrieving relevant context to feed to an LLM).

Traditional databases aren’t optimized for this. Comparing high-dimensional vectors for similarity across millions of records is computationally expensive. Vector databases use specialized indexing (HNSW, IVF, etc.) to make this search fast.

When You Actually Need a Vector Database

Large-scale semantic search (millions+ of documents). If you’re building a search system over millions of documents and need semantic similarity matching, a vector database delivers meaningful performance improvements over brute-force similarity calculations.

At scale, the difference between searching 10 million embeddings with brute force (minutes) and with an optimized vector index (milliseconds) is the difference between usable and unusable.

Real-time recommendation systems. Finding similar items for recommendations across large catalogs requires fast similarity search. Vector databases handle this efficiently.

Multi-modal search. If you’re searching across text, images, and other media types using shared embedding space, a vector database provides unified storage and retrieval.

High-throughput RAG applications. RAG systems with high query volumes and large knowledge bases benefit from vector database performance characteristics. If you’re serving thousands of queries per second against a 50GB embedding corpus, specialized vector infrastructure makes sense.

When You Don’t Need One

Small knowledge bases (<100k documents). For small datasets, brute-force similarity search is fast enough. Computing cosine similarity across 50,000 1536-dimensional vectors takes milliseconds on a modern CPU.

You can store embeddings in a regular database (PostgreSQL with pgvector, SQLite with extensions) or even in-memory and compute similarity on-demand. The performance difference is negligible at this scale.

Low query volume. If your RAG application serves 10 queries per hour rather than 10 per second, response time in the 100-500ms range is fine. Optimizing retrieval to sub-10ms with specialized infrastructure provides no user-facing benefit.

Prototyping and development. For early-stage projects validating whether RAG or semantic search solves your problem, start simple. Use pgvector or in-memory search. Add vector database complexity when you’ve proven the concept and have scaling needs.

Keyword search is sufficient. Many “semantic search” problems are actually solved fine with good keyword search (Elasticsearch, Solr) plus synonym expansion and basic NLP. If traditional search works, don’t replace it with vector search just because it’s trendy.

The Simpler Alternatives

PostgreSQL with pgvector. If you’re already using PostgreSQL, pgvector extension adds vector similarity search. Performance is adequate up to hundreds of thousands of vectors. It’s not as fast as purpose-built vector databases at scale, but for many applications, “fast enough” beats “theoretically optimal but complex.”

SQLite with extensions. For embedded applications or small-scale systems, SQLite vector extensions provide similarity search without external dependencies.

In-memory vector storage. For datasets under 1GB, loading embeddings into memory (numpy arrays or similar) and computing similarity with libraries like FAISS works well. No database required.

Traditional search with embeddings for reranking. Retrieve candidates with keyword search, then rerank with embedding similarity. This hybrid approach often performs better than pure vector search and doesn’t require vector database infrastructure.

The Operational Cost

Vector databases add operational complexity:

  • Another service to run, monitor, and maintain
  • Data synchronization (keeping vector DB in sync with your primary database)
  • Backup and recovery procedures distinct from your other data stores
  • Query performance tuning specific to vector indexes
  • Cost (hosted vector DB services charge based on storage and queries; at scale, this isn’t trivial)

For organizations with small engineering teams, every additional infrastructure component is meaningful overhead. Add complexity only when the benefit clearly justifies it.

When to Upgrade From Simple to Specialized

Start simple. Use PostgreSQL with pgvector or in-memory search. Measure performance under realistic load.

Upgrade to a specialized vector database when you hit concrete limitations:

  • Query latency exceeds acceptable thresholds (typically >200ms for user-facing applications)
  • Dataset size exceeds what fits comfortably in memory or performs adequately in PostgreSQL (typically >1M vectors)
  • Query volume requires horizontal scaling beyond what a single database instance provides

Don’t upgrade preemptively. Premature optimization applies to infrastructure choices as much as code.

The Vendor Landscape

Pinecone — Fully managed, easy to get started, expensive at scale. Good for prototyping and projects where operational simplicity trumps cost optimization.

Weaviate — Open source, self-hostable, or managed. More complex to operate than Pinecone but cheaper at scale. Supports hybrid search (vectors + keywords).

Qdrant — Open source, Rust-based, focused on performance. Good for self-hosting. Fewer managed options than Pinecone or Weaviate.

Milvus — Open source, designed for massive scale. Overkill for most projects but solid for genuinely large deployments.

ChromaDB — Lightweight, embedded option. Good for development and small deployments. Not suitable for production scale.

Choose based on your scale, operational capacity, and whether you’re optimizing for ease-of-use or cost-efficiency.

RAG Without a Vector Database

RAG (Retrieval-Augmented Generation) is the killer app for vector databases, but you don’t need one to implement RAG.

A minimal RAG system:

  1. Chunk your documents
  2. Generate embeddings (OpenAI, Cohere, or open source models)
  3. Store embeddings in PostgreSQL with pgvector
  4. On query: embed the query, find top-k similar chunks, pass to LLM with the query

This works for knowledge bases up to 100k documents without specialized infrastructure. It’s not infinitely scalable, but it’s simple, maintainable, and sufficient for most organizational knowledge base use cases.

The Bottom Line

Vector databases solve a real problem: fast similarity search at scale. For applications that need this — large-scale semantic search, high-throughput recommendations, massive RAG deployments — they’re valuable.

For small to medium projects, simpler solutions often work better. PostgreSQL with pgvector, in-memory search, or even clever use of traditional search covers a surprising range of use cases.

The AI infrastructure hype cycle encourages adding every hot technology to your stack. Resist this. Start simple, measure actual performance under realistic conditions, and add complexity only when you’ve proven you need it.

Vector databases are tools, not requirements. Use them when they solve a problem you actually have, not because they’re on every “modern AI stack” diagram.

Your users don’t care whether you’re using Pinecone or PostgreSQL. They care whether the application works, responds quickly, and returns relevant results. Choose infrastructure that delivers this with the least operational overhead for your team.