The Agent Factory: Planner, Executor, Critic, Referee - Zorost Intelligence

Pull-quote: “The four-role pattern is not an opinion. It’s the architecture every production multi-agent system converges on once it survives the first round of real users.”

Why this matters

Multi-agent AI starts as a clever idea (let agents talk to each other!) and dies in production as an unreliable mess (agents hallucinate to each other, disagreements never resolve, the audit trail is unreadable). The fix is structural: four roles, typed contracts, deterministic logs.

The four roles

Planner — decomposes the high-level goal into sub-goals and decides the sequence. Reads the task, the available tools, and the agent’s memory; emits a structured plan.
Executor(s) — carries out sub-goals. Calls tools. Returns structured outputs. Knows nothing about the high-level plan; just executes its assigned sub-goal honestly.
Critic — reviews each executor output adversarially. Looks for unsupported claims, broken citations, missed evidence, alternative interpretations. Does not propose new actions; only critiques.
Referee — adjudicates when the critic disagrees with the executor. Has explicit criteria. Produces the final decision with explicit reasoning.

Why this works

Planner / executor separation prevents the planner from drifting into execution and getting confused by tool errors.
Critic separation prevents the executors from grading their own work, which is a category error.
Referee separation prevents endless analyst-vs-critic loops.

Common variations

Single executor vs. multi-executor (parallelism). Parallel executors for independent sub-goals; serial for dependent ones.
Critic per executor or shared critic. Per-executor for specialized critique; shared for consistency across the run.
Hierarchical planning. A meta-planner produces a plan that includes “now plan this sub-task in detail” steps.

What we standardize

We standardize three things across every production agentic system:

Typed tool contracts — every tool has explicit input/output schemas. No improvisation.
Deterministic logs — every call (planner → executor, executor → tool, critic → executor) is logged with timestamps and parameters.
Evaluation harnesses — every system ships with a golden dataset, a regression suite, hallucination detection, and grounding scoring. New versions are evaluated before promotion.

Where we run this pattern

AeroFarr — multi-tool aviation analyst (planner / executor / critic over the prediction core, the cascade GNN, the causal engine, and the RAG corpus)
EvidAI — 4-model consensus screening with explicit critic and referee
FreightCortex — 16-tool AI freight analyst with planner / executor and a critic on report quality
Aquil — sourcers / analysts / critic / referee for OSINT
SPCio (with a manufacturing intelligence partner) — 8 specialized agents with a meta-coordinator

Closing

The four-role pattern is not an opinion. It is the architecture every production multi-agent system converges on once it survives the first round of real users. Skipping it is a tax you pay later.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Why this matters

The four roles

Why this works

Common variations

What we standardize

Where we run this pattern

Closing

Related

Recent posts

Archive

Tags

Transformative Consulting for Cloud, AI, and Beyond.

Recent comments

Company

Services