Vector Search Archives - Zorost Intelligence | AI, Cloud & Data Experts

Hybrid Retrieval: Why Vector Alone Isn’t Enough

Zorost Intelligence — Tue, 17 Feb 2026 09:00:00 +0000

Pull-quote: “Pure vector retrieval is the most common production-grade RAG mistake. Pure BM25 is the second most common.”

Why this matters

A pattern repeats in every RAG project that goes wrong: someone embeds the corpus, runs vector search, and ships. The system works in demos and disappoints in production. The fix is a structural architecture change: hybrid retrieval.

The components

Query
  │
  ├──► Dense (vector)   — pgvector / Weaviate / Qdrant + an embedding model
  │
  ├──► Sparse (BM25)    — Postgres FTS / Elasticsearch / OpenSearch
  │
  ├──► Optional filters — date range, source, entity tags
  │
  └──► Merge (RRF or weighted) ──► Cross-encoder re-rank ──► Top-K
                                                                │
                                                                ▼
                                                Citation-grounded generation

Why each piece matters

Vector is excellent at semantic similarity — finding documents that are about the same topic in different words. It is bad at named entities — exact terms, IDs, dates.
BM25 is the opposite — excellent at named entities, weaker on semantic similarity.
Filters — when the question is bounded (“just look at 2024 reports about Boeing 737”), filters dramatically reduce the candidate set before ranking.
Merge — Reciprocal Rank Fusion (RRF) is a clean default. Weighted merges work with calibrated scores.
Cross-encoder re-rank — sees the query and the candidate document together and scores them jointly. More expensive than bi-encoder vector search, but the precision improvement on the top-K is large enough to pay for itself.

What changes when you do this right

Hallucination rate drops. The model has better evidence to ground in.
Citation precision goes up. The cited documents actually support the claim.
Edge cases (rare entity queries, exact-quote queries) work properly.
Generation latency stays low because the model only sees the top-K (typically 6–10), not the top-100.

Common mistakes

No re-ranker. Top-50 from vector + top-50 from BM25 with RRF is a starting point, but without a re-ranker the top-K still contains noise.
No filtering. Filtering before retrieval is essentially free if your data is properly indexed.
Skip evaluation. Without a golden Q&A dataset and grounding scoring, you have no way to compare retrieval architectures.

Closing

Pure vector retrieval is the most common production-grade RAG mistake. Hybrid retrieval — vector + sparse + filters + re-rank — is the boring, reliable, production answer. Every Zorost RAG system runs this architecture.

The post Hybrid Retrieval: Why Vector Alone Isn’t Enough appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

Production-Grade RAG on the Lakehouse with Mosaic AI Vector Search

Zorost Intelligence — Tue, 03 Feb 2026 09:00:00 +0000

Pull-quote: “RAG works in demos. RAG that works in production requires hybrid retrieval, a re-ranker, citation grounding, and an evaluation harness.”

Why this matters

Most RAG projects pilot well and disappoint in production. The pattern is the same: embed the corpus, run vector search, ship. Production-grade RAG requires more.

The production RAG architecture

                     ┌────────────────────┐
        Question ───►│  AI Gateway        │  ← key mgmt, routing, observability
                     └──────────┬─────────┘
                                ▼
        ┌────────────────────────────────────────────┐
        │                Retrieval                    │
        │  ┌────────────────┐  ┌────────────────┐   │
        │  │ Mosaic AI      │  │ BM25 (lexical) │   │
        │  │ Vector Search  │  │ on Delta SQL   │   │
        │  │ (Delta-synced) │  │                │   │
        │  └───────┬────────┘  └────────┬───────┘   │
        │          └──── merge (RRF) ───┘           │
        │                  │                          │
        │              cross-encoder                  │
        │              re-rank                        │
        └────────────────┬─────────────────────────────┘
                         ▼
              top-K (typically 6–10)
                         │
                         ▼
              Citation-grounded generation
              (Mosaic AI Model Serving)
                         │
                         ▼
              Validated answer with source links

Why Mosaic AI Vector Search specifically

Mosaic AI Vector Search synchronizes with Delta tables. Update the source table, the index updates. No orchestration glue. Tagging, ACLs, and lineage flow through Unity Catalog. For RAG over enterprise data that changes, this matters more than people initially appreciate.

Hybrid retrieval is the pattern

Pure vector search is the most common production RAG mistake. Pure BM25 is the second most common. Hybrid — vector + BM25 + filters + re-rank — is the answer that actually works.

Citation grounding as a structural fix

Constrain the model to write with bracketed citation tokens. Validate every citation against the retrieval set. Reject answers that fail validation. This is a small structural change with a large operational impact.

Evaluation harness — non-negotiable

A production RAG system without an evaluation harness is a guess. The harness has three components:

Golden Q&A dataset — questions paired with the documents that should ground the answers
Grounding rate — what fraction of generated claims are supported by retrieved documents
Hallucination detection — flagging unsupported claims

The harness runs as a Databricks Job on every model or retrieval change. Regressions are caught before deployment.

Closing

Production RAG on the Lakehouse with Mosaic AI is straightforward when you adopt the architecture: hybrid retrieval, re-ranker, citation grounding, evaluation harness. The result is a RAG system analysts trust enough to use.

The post Production-Grade RAG on the Lakehouse with Mosaic AI Vector Search appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

A Retrieval Engine over the World’s Aviation Safety Corpus

Zorost Intelligence — Tue, 13 Jan 2026 09:00:00 +0000

Pull-quote: “Vector search alone is not retrieval. It is one signal among several.”

Why this matters

Aviation safety knowledge sits in two enormous public-domain corpora: the U.S. NTSB accident reports and the NASA ASRS voluntary safety reports. Together, that’s 247,000+ documents of structured incident narratives. Pilots, controllers, and operations engineers have written them under the assumption that they would be searched, cross-referenced, and learned from.

Most platforms reduce this to keyword search. Better platforms add full-text search. The frontier is citation-grounded retrieval-augmented generation — the assistant retrieves, the model writes, every claim links back to the source documents.

Why hybrid retrieval

The naive approach to a RAG system is “embed everything and run a vector search.” It does not work in production. Vector search is excellent at finding semantically similar documents and bad at finding specifically named entities. BM25 is the opposite. Production retrieval needs both.

Our retrieval pipeline:

Question
   │
   ├──► dense (pgvector + BGE-large) ──► top 50
   ├──► sparse (BM25)                  ──► top 50
   │
   └──► merge + cross-encoder re-rank   ──► top 8
                            │
                            ▼
                Citation-grounded generation
                (Gemini 2.5 Flash for fast answers,
                 Claude / GPT for detailed analysis)

Why a re-ranker

The re-ranker (a cross-encoder, not a bi-encoder) sees the query and the candidate document together and scores them jointly. This is more expensive per call than vector search, but the precision improvement on the top-8 is large enough that it pays for itself — fewer retrievals, fewer hallucinations, better answers.

Why citation grounding

The default mode of an LLM is to fabricate plausible-sounding answers. The fix is structural: the model is constrained to write its answer with bracketed citation tokens, and the citation tokens must reference documents that actually exist in the retrieval set. Generation is post-processed to validate the citations and reject any answer that fails validation.

This is a small structural change with a large operational impact. It moves the system from “talking to a model that has ingested aviation knowledge” to “asking a model to summarize specific source documents.”

What this is good at

“What are the leading causes of runway incursions for regional jets in low-visibility conditions?”
“Show me ASRS reports that match the pattern of sudden hydraulic failure during flap retraction.”
“What are the recurring training gaps that show up in cargo operations CRM reports?”

What it is not good at: real-time operational queries that need current schedule data — those go to the predictive and causal layers.

Closing

Vector search alone is not retrieval. It is one signal. Production-grade RAG over a regulated safety corpus requires hybrid retrieval, a real re-ranker, and structural citation grounding. The result is an assistant analysts trust enough to use — which is the only metric that matters.

The post A Retrieval Engine over the World’s Aviation Safety Corpus appeared first on Zorost Intelligence | AI, Cloud & Data Experts.