Vector Search vs Keyword Search: Choose the Right Tool, Not the Trendy One
Imagine you’re looking for a book in two different libraries. The first library has a perfect card catalogue; every word in every book is indexed. You walk in, say “ERROR 1045”, and the librarian hands you the exact page. Fast, precise, surgical.
The second library has a librarian with deep reading comprehension. You walk in and say, “I can’t connect to my database,” and she understands what you mean, walks you to the right shelf, and pulls three books that might help, even if none of them uses the phrase “can’t connect.”
Both librarians are valuable. The mistake is assuming the second one makes the first one obsolete.
That’s the vector search vs keyword search debate in a nutshell. And in this post, I’ll break down exactly when each approach wins, where hybrid search bridges the gap, and why choosing vector search purely because it sounds more “AI” is how you build an expensive solution to the wrong problem.
Let’s get started.
What is Keyword Search?
Keyword search (also called lexical search) finds documents by matching the exact tokens in your query against an inverted index of your corpus. The dominant algorithm is BM25 (Best Match 25), a ranking function that weighs matches by term frequency and how rare the term is across the document collection.
BM25 is 30 years old. It powers Elasticsearch, OpenSearch, and Solr. It needs no model, no GPU, no embedding API. It’s deterministic, the same query always returns the same results, and it’s fast. Sub-millisecond retrieval at millions of documents.
It is, to put it plainly, still extremely good.
What is Vector Search?
Vector search (semantic search) converts your documents and queries into high-dimensional numerical vectors using an embedding model, something like OpenAI’s `text-embedding-3-small`, Cohere Embed, or open-source models like E5 or BGE. Once embedded, documents and queries live in the same vector space, and similarity is measured by cosine similarity or dot product.
The key insight: semantically similar content ends up close together in vector space, even if the words are completely different. “I can’t log in” and “authentication failure” end up near each other. BM25 would miss that connection entirely.
Vector search enables retrieval based on meaning, not just tokens. That’s genuinely powerful. But it comes with a cost that teams consistently underestimate.
When Keyword Search Wins
This is the part that gets skipped in most “AI search” blog posts.
Exact Matches That Must Be Exact
If a user searches for `ERROR 1045`, they want documents containing exactly that. A vector search might surface related database authentication errors conceptually relevant, but not the one the user typed. The same applies to:
- Product SKUs: “iPhone 15 Pro Max 256GB” must return that exact model, not the nearest semantic neighbour
- Order IDs, serial numbers, account numbers
- Medical codes (ICD-10), legal citations, regulatory references
- API function names, config flags, library import paths
In these cases, a 30x performance advantage for BM25 and exact precision makes it the obvious choice.
Cost and Operational Reality
Vector search has hidden costs that only reveal themselves in production:
Embedding latency: Small embedding models run at ~16ms. Large models (7B+ parameters) sit at 187-221ms — over 10x slower. At user-facing latency budgets, this matters significantly.
Infrastructure overhead: You need an embedding model (hosted or self-managed), a vector database, and a sync pipeline to keep it up to date. BM25 runs on Elasticsearch you’re already operating.
Embedding model lock-in: If you switch embedding models, you must re-embed your entire document corpus. New query vectors won’t align with old document vectors. That’s a silent, expensive migration with no warning signs until results degrade.
Index degradation at scale: HNSW (the algorithm powering most vector DBs). Recall can drop by 10%+ as the database grows from 50k to 200k vectors. Your infrastructure dashboards look fine. Your search quality is quietly getting worse.
When Vector Search Wins
Vector search earns its complexity in specific scenarios.
The Words Don’t Match
The most common scenario: a user describes what they need without knowing the exact terminology used in your documents. “My app is getting slow under load” should surface articles about performance optimisation, connection pooling, and caching strategies, even if none of them uses the phrase “getting slow.”
This is where keyword search fails completely, and vector search excels.
Multilingual Search
Embedding models are trained on multilingual corpora. A query in English can retrieve semantically similar documents written in Spanish, French, or German. BM25 requires explicit multilingual tokenisation pipelines to even attempt this.
Recommendation and Similarity
“More like this article” queries, duplicate detection, and recommendation engines are a natural fit for vector similarity. Find the documents closest to a given embedding; there’s no keyword equivalent.
The Real Answer: Hybrid Search
Here’s the thing neither camp wants to admit: the best production search systems use both.
**Hybrid search** combines BM25 keyword results and vector search results, then merges them using **Reciprocal Rank Fusion (RRF)**. Instead of trying to normalise incompatible scores, RRF uses rank position. The formula:
score = 1 / (rank + k)Where `rank` is the document’s position in either result list, and `k` (typically 60) prevents top-ranked documents from dominating too aggressively.
A document appearing at position 1 in the BM25 list and position 2 in the vector list scores very high. A document appearing at position 1 in only one list scores lower. The result: exact-match precision from BM25, semantic recall from vector search, merged into a single ranked list that consistently outperforms either approach alone.
Here’s what this looks like in practice with LangChain:
from langchain.retrievers import BM25Retriever, EnsembleRetriever
from langchain_community.vectorstores import Chroma
# BM25 retriever over your document corpus
bm25_retriever = BM25Retriever.from_documents(docs)
bm25_retriever.k = 5
# Vector retriever with your embedding model
vectorstore = Chroma.from_documents(docs, embedding_function)
vector_retriever = vectorstore.as_retriever(search_kwargs={”k”: 5})
# Hybrid retriever using RRF (weights: 50/50)
ensemble_retriever = EnsembleRetriever(
retrievers=[bm25_retriever, vector_retriever],
weights=[0.5, 0.5]
)
results = ensemble_retriever.invoke(”authentication timeout error”)This retriever does the right thing for both query types: a search for “ERROR 1045” gets exact BM25 precision; a search for “why does my login keep failing” gets semantic vector recall. You don’t have to choose.
You can tune the weights based on your domain. Code search might run 70% BM25 / 30% vector. A customer support chatbot might flip to 30% BM25 / 70% vector. The EnsembleRetriever makes this trivially adjustable.
Native hybrid search is now supported in: Azure AI Search, Elasticsearch 8+, OpenSearch 2.19, Weaviate, Chroma, pgvector + pg_bm25 (ParadeDB), SingleStore, and MariaDB.
The Decision Framework
Like everything in software engineering, one needs to weigh the pros and cons. Here’s a practical guide:
Error codes, SKUs, order IDs: Use keyword only (BM25)
Code search, API documentation: Use keyword-dominant hybrid (70/30)
Customer support FAQ: Use balanced hybrid (50/50)
Conversational / intent-driven search: Use vector-dominant hybrid (30/70)
“More like this” recommendations: Use vector only
Multilingual search: Use vector only
I need to emphasise this: the question is never “which is better?” The question is “what does my query distribution actually look like, and what does my current infrastructure already support?”
Conclusion
Vector search is a genuinely powerful tool. Embedding-based retrieval unlocks search experiences that keyword matching simply cannot deliver. I’m not arguing against it.
But I’ve seen teams rip out working Elasticsearch deployments to stand up a vector database, a managed embedding API, a sync pipeline, and a new infrastructure dependency, for a search use case that was doing just fine with BM25.
The hype around vector search is real. The problems it solves are real. So are the costs, the operational complexity, and the cases where a 30-year-old ranking function still does the job better.
Don’t choose based on what sounds more modern. Choose based on what your users are actually searching for. And if you’re unsure, hybrid search gives you both, with a tunable dial between precision and recall.
Start there. Optimise from data, not from trends.

