Hybrid Retrieval: Why Vector Alone Isn’t Enough

Zorost Intelligence — Tue, 17 Feb 2026 09:00:00 +0000

Pull-quote: “Pure vector retrieval is the most common production-grade RAG mistake. Pure BM25 is the second most common.”

Why this matters

A pattern repeats in every RAG project that goes wrong: someone embeds the corpus, runs vector search, and ships. The system works in demos and disappoints in production. The fix is a structural architecture change: hybrid retrieval.

The components

Query
  │
  ├──► Dense (vector)   — pgvector / Weaviate / Qdrant + an embedding model
  │
  ├──► Sparse (BM25)    — Postgres FTS / Elasticsearch / OpenSearch
  │
  ├──► Optional filters — date range, source, entity tags
  │
  └──► Merge (RRF or weighted) ──► Cross-encoder re-rank ──► Top-K
                                                                │
                                                                ▼
                                                Citation-grounded generation

Why each piece matters

Vector is excellent at semantic similarity — finding documents that are about the same topic in different words. It is bad at named entities — exact terms, IDs, dates.
BM25 is the opposite — excellent at named entities, weaker on semantic similarity.
Filters — when the question is bounded (“just look at 2024 reports about Boeing 737”), filters dramatically reduce the candidate set before ranking.
Merge — Reciprocal Rank Fusion (RRF) is a clean default. Weighted merges work with calibrated scores.
Cross-encoder re-rank — sees the query and the candidate document together and scores them jointly. More expensive than bi-encoder vector search, but the precision improvement on the top-K is large enough to pay for itself.

What changes when you do this right

Hallucination rate drops. The model has better evidence to ground in.
Citation precision goes up. The cited documents actually support the claim.
Edge cases (rare entity queries, exact-quote queries) work properly.
Generation latency stays low because the model only sees the top-K (typically 6–10), not the top-100.

Common mistakes

No re-ranker. Top-50 from vector + top-50 from BM25 with RRF is a starting point, but without a re-ranker the top-K still contains noise.
No filtering. Filtering before retrieval is essentially free if your data is properly indexed.
Skip evaluation. Without a golden Q&A dataset and grounding scoring, you have no way to compare retrieval architectures.

Closing

Pure vector retrieval is the most common production-grade RAG mistake. Hybrid retrieval — vector + sparse + filters + re-rank — is the boring, reliable, production answer. Every Zorost RAG system runs this architecture.

The post Hybrid Retrieval: Why Vector Alone Isn’t Enough appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

A Retrieval Engine over the World’s Aviation Safety Corpus

Zorost Intelligence — Tue, 13 Jan 2026 09:00:00 +0000

Pull-quote: “Vector search alone is not retrieval. It is one signal among several.”

Why this matters

Aviation safety knowledge sits in two enormous public-domain corpora: the U.S. NTSB accident reports and the NASA ASRS voluntary safety reports. Together, that’s 247,000+ documents of structured incident narratives. Pilots, controllers, and operations engineers have written them under the assumption that they would be searched, cross-referenced, and learned from.

Most platforms reduce this to keyword search. Better platforms add full-text search. The frontier is citation-grounded retrieval-augmented generation — the assistant retrieves, the model writes, every claim links back to the source documents.

Why hybrid retrieval

The naive approach to a RAG system is “embed everything and run a vector search.” It does not work in production. Vector search is excellent at finding semantically similar documents and bad at finding specifically named entities. BM25 is the opposite. Production retrieval needs both.

Our retrieval pipeline:

Question
   │
   ├──► dense (pgvector + BGE-large) ──► top 50
   ├──► sparse (BM25)                  ──► top 50
   │
   └──► merge + cross-encoder re-rank   ──► top 8
                            │
                            ▼
                Citation-grounded generation
                (Gemini 2.5 Flash for fast answers,
                 Claude / GPT for detailed analysis)

Why a re-ranker

The re-ranker (a cross-encoder, not a bi-encoder) sees the query and the candidate document together and scores them jointly. This is more expensive per call than vector search, but the precision improvement on the top-8 is large enough that it pays for itself — fewer retrievals, fewer hallucinations, better answers.

Why citation grounding

The default mode of an LLM is to fabricate plausible-sounding answers. The fix is structural: the model is constrained to write its answer with bracketed citation tokens, and the citation tokens must reference documents that actually exist in the retrieval set. Generation is post-processed to validate the citations and reject any answer that fails validation.

This is a small structural change with a large operational impact. It moves the system from “talking to a model that has ingested aviation knowledge” to “asking a model to summarize specific source documents.”

What this is good at

“What are the leading causes of runway incursions for regional jets in low-visibility conditions?”
“Show me ASRS reports that match the pattern of sudden hydraulic failure during flap retraction.”
“What are the recurring training gaps that show up in cargo operations CRM reports?”

What it is not good at: real-time operational queries that need current schedule data — those go to the predictive and causal layers.

Closing

Vector search alone is not retrieval. It is one signal. Production-grade RAG over a regulated safety corpus requires hybrid retrieval, a real re-ranker, and structural citation grounding. The result is an assistant analysts trust enough to use — which is the only metric that matters.

The post A Retrieval Engine over the World’s Aviation Safety Corpus appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

BM25 Archives - Zorost Intelligence | AI, Cloud & Data Experts

Hybrid Retrieval: Why Vector Alone Isn’t Enough

Why this matters

The components

Why each piece matters

What changes when you do this right

Common mistakes

Closing

A Retrieval Engine over the World’s Aviation Safety Corpus

Why this matters

Why hybrid retrieval

Why a re-ranker

Why citation grounding

What this is good at

Closing