Evaluation Archives - Zorost Intelligence

Vector search is excellent at semantic similarity and bad at named entities. BM25 is the opposite. Production-grade retrieval is hybrid — and the architecture decisions matter.

Agentic AI Engineering

BM25, Evaluation, Hybrid Retrieval, RAG, Vector Search

February 10, 2026

Zorost Intelligence

Why Calibration Matters More Than Accuracy: an ECE 0.012 Story

Headline accuracy is a misleading metric for high-stakes decisions. Calibration is the real one. Here is what ECE 0.012 means and how we got there.

Aviation Intelligence

AeroFarr, Calibration, Conformal Prediction, ECE, Evaluation, LACP

February 3, 2026

Zorost Intelligence

Production-Grade RAG on the Lakehouse with Mosaic AI Vector Search

How to design, build, and evaluate a production RAG system on Databricks using Mosaic AI Vector Search, hybrid retrieval, and a real evaluation harness.

Databricks Modernization

Evaluation, Hybrid Retrieval, Mosaic AI, RAG, Vector Search

January 20, 2026

Zorost Intelligence

Multi-Agent OSINT with a Critic and a Referee

A swarm of agents producing summaries is not analysis. Adding a critic and a referee changes what the system is. Here is how Aquil’s OSINT architecture is structured.

Geopolitical Intelligence

Aquil, Causal Inference, Evaluation, Multi-Agent, OSINT

December 23, 2025

Zorost Intelligence

The Agent Factory: Planner, Executor, Critic, Referee

Most production agentic systems converge on the same architecture: a planner, an executor, a critic, and a referee. Here is the pattern, why it works, and how we apply it across industries.

Agentic AI Engineering

Agentic AI, Evaluation, Governance, LangGraph, Multi-Agent

December 16, 2025

Zorost Intelligence

Living Systematic Reviews: Evidence That Stays Current

A traditional systematic review is a snapshot, frozen at the search date. A living review is a stream, refreshed as new evidence appears. Here is the architecture that makes living reviews operationally feasible.

Pharmaceutical Research

Benchmarking, Evaluation, EvidAI, PRISMA 2020, RAG

November 4, 2025

Zorost Intelligence

Multi-Agent Consensus for Systematic Literature Review

Single-LLM screening makes the SLR process faster but no more accurate. Multi-agent consensus screening — with four models, explanations, and disagreement detection — preserves PRISMA 2020 rigor.

Pharmaceutical Research

Evaluation, EvidAI, Multi-Agent, PRISMA 2020, Risk of Bias, ROBINS-I

Hybrid Retrieval: Why Vector Alone Isn’t Enough

Why Calibration Matters More Than Accuracy: an ECE 0.012 Story

Production-Grade RAG on the Lakehouse with Mosaic AI Vector Search

Multi-Agent OSINT with a Critic and a Referee

The Agent Factory: Planner, Executor, Critic, Referee

Living Systematic Reviews: Evidence That Stays Current

Multi-Agent Consensus for Systematic Literature Review

Company

Services