Hybrid Retrieval: Why Vector Alone Isn’t Enough

Zorost Intelligence — Tue, 17 Feb 2026 09:00:00 +0000

Pull-quote: “Pure vector retrieval is the most common production-grade RAG mistake. Pure BM25 is the second most common.”

Why this matters

A pattern repeats in every RAG project that goes wrong: someone embeds the corpus, runs vector search, and ships. The system works in demos and disappoints in production. The fix is a structural architecture change: hybrid retrieval.

The components

Query
  │
  ├──► Dense (vector)   — pgvector / Weaviate / Qdrant + an embedding model
  │
  ├──► Sparse (BM25)    — Postgres FTS / Elasticsearch / OpenSearch
  │
  ├──► Optional filters — date range, source, entity tags
  │
  └──► Merge (RRF or weighted) ──► Cross-encoder re-rank ──► Top-K
                                                                │
                                                                ▼
                                                Citation-grounded generation

Why each piece matters

Vector is excellent at semantic similarity — finding documents that are about the same topic in different words. It is bad at named entities — exact terms, IDs, dates.
BM25 is the opposite — excellent at named entities, weaker on semantic similarity.
Filters — when the question is bounded (“just look at 2024 reports about Boeing 737”), filters dramatically reduce the candidate set before ranking.
Merge — Reciprocal Rank Fusion (RRF) is a clean default. Weighted merges work with calibrated scores.
Cross-encoder re-rank — sees the query and the candidate document together and scores them jointly. More expensive than bi-encoder vector search, but the precision improvement on the top-K is large enough to pay for itself.

What changes when you do this right

Hallucination rate drops. The model has better evidence to ground in.
Citation precision goes up. The cited documents actually support the claim.
Edge cases (rare entity queries, exact-quote queries) work properly.
Generation latency stays low because the model only sees the top-K (typically 6–10), not the top-100.

Common mistakes

No re-ranker. Top-50 from vector + top-50 from BM25 with RRF is a starting point, but without a re-ranker the top-K still contains noise.
No filtering. Filtering before retrieval is essentially free if your data is properly indexed.
Skip evaluation. Without a golden Q&A dataset and grounding scoring, you have no way to compare retrieval architectures.

Closing

Pure vector retrieval is the most common production-grade RAG mistake. Hybrid retrieval — vector + sparse + filters + re-rank — is the boring, reliable, production answer. Every Zorost RAG system runs this architecture.

The post Hybrid Retrieval: Why Vector Alone Isn’t Enough appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

The Agent Factory: Planner, Executor, Critic, Referee

Zorost Intelligence — Tue, 23 Dec 2025 09:00:00 +0000

Pull-quote: “The four-role pattern is not an opinion. It’s the architecture every production multi-agent system converges on once it survives the first round of real users.”

Why this matters

Multi-agent AI starts as a clever idea (let agents talk to each other!) and dies in production as an unreliable mess (agents hallucinate to each other, disagreements never resolve, the audit trail is unreadable). The fix is structural: four roles, typed contracts, deterministic logs.

The four roles

Planner — decomposes the high-level goal into sub-goals and decides the sequence. Reads the task, the available tools, and the agent’s memory; emits a structured plan.
Executor(s) — carries out sub-goals. Calls tools. Returns structured outputs. Knows nothing about the high-level plan; just executes its assigned sub-goal honestly.
Critic — reviews each executor output adversarially. Looks for unsupported claims, broken citations, missed evidence, alternative interpretations. Does not propose new actions; only critiques.
Referee — adjudicates when the critic disagrees with the executor. Has explicit criteria. Produces the final decision with explicit reasoning.

Why this works

Planner / executor separation prevents the planner from drifting into execution and getting confused by tool errors.
Critic separation prevents the executors from grading their own work, which is a category error.
Referee separation prevents endless analyst-vs-critic loops.

Common variations

Single executor vs. multi-executor (parallelism). Parallel executors for independent sub-goals; serial for dependent ones.
Critic per executor or shared critic. Per-executor for specialized critique; shared for consistency across the run.
Hierarchical planning. A meta-planner produces a plan that includes “now plan this sub-task in detail” steps.

What we standardize

We standardize three things across every production agentic system:

Typed tool contracts — every tool has explicit input/output schemas. No improvisation.
Deterministic logs — every call (planner → executor, executor → tool, critic → executor) is logged with timestamps and parameters.
Evaluation harnesses — every system ships with a golden dataset, a regression suite, hallucination detection, and grounding scoring. New versions are evaluated before promotion.

Where we run this pattern

AeroFarr — multi-tool aviation analyst (planner / executor / critic over the prediction core, the cascade GNN, the causal engine, and the RAG corpus)
EvidAI — 4-model consensus screening with explicit critic and referee
FreightCortex — 16-tool AI freight analyst with planner / executor and a critic on report quality
Aquil — sourcers / analysts / critic / referee for OSINT
SPCio (with a manufacturing intelligence partner) — 8 specialized agents with a meta-coordinator

Closing

The four-role pattern is not an opinion. It is the architecture every production multi-agent system converges on once it survives the first round of real users. Skipping it is a tax you pay later.

The post The Agent Factory: Planner, Executor, Critic, Referee appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

Agentic AI Engineering Archives - Zorost Intelligence | AI, Cloud & Data Experts

Hybrid Retrieval: Why Vector Alone Isn’t Enough

Why this matters

The components

Why each piece matters

What changes when you do this right

Common mistakes

Closing

The Agent Factory: Planner, Executor, Critic, Referee

Why this matters

The four roles

Why this works

Common variations

What we standardize

Where we run this pattern

Closing