Multi-Agent Archives - Zorost Intelligence | AI, Cloud & Data Experts

When Agents Call Agents: Why the MCP Server Matters in Freight

Zorost Intelligence — Tue, 24 Feb 2026 09:00:00 +0000

Pull-quote: “If your platform isn’t callable by other agents, your platform isn’t future-proof.”

Why this matters

The next generation of enterprise software is being shaped by a simple fact: users have agents now. Claude Desktop, custom internal agents, vendor-provided agents — they’re all going to call your platform. Either they call it through your REST API (and the agent has to know your URL structure, your authentication, your error semantics) or they call it through a standard protocol.

That standard is Model Context Protocol (MCP).

What MCP is

MCP is an open protocol developed by Anthropic and adopted across the agent ecosystem. It defines how an AI agent describes its tools, how a host (the agent’s runtime) discovers and calls those tools, and how results are returned. The result is a clean separation: tools are advertised, agents discover and call them, and you can swap tool servers without touching the agent.

For FreightCortex, the MCP server is a thin layer that exposes our 16 tools using the protocol. An external agent — a customer’s internal Claude Desktop, an OEM’s analytics chatbot, or a third-party tool — can connect to our MCP endpoint and use FreightCortex like a native tool.

What this unlocks

Three things:

Native callability from any MCP-compatible agent. Customers do not need to write custom integrations. Their agent just connects to our MCP server.
Composability with other tools. A customer agent can use FreightCortex tools alongside their own internal tools. The agent decides when to call which.
Future-proofing. As the agent ecosystem grows, MCP-compatible platforms are accessible by default. REST-only platforms have to be manually integrated, one customer at a time.

What it requires

Three engineering investments:

Tool contracts — every tool we want to expose has a typed schema. (We already had this.)
The MCP server itself — a thin transport layer over those tools.
Authentication and rate limiting — MCP doesn’t replace your existing auth; it sits on top of it.

A concrete example

An analyst is using Claude Desktop on her workstation. She asks “what’s driving the cost increase on the Atlanta–Dallas corridor?” Claude knows about the FreightCortex MCP server (configured once per workstation) and decides to use it. It calls query_corridor_metrics, compute_anomaly_score, query_carrier_metrics, and run_capacity_simulation — and produces an answer with the same structure as the answer it would have given inside the FreightCortex web app, except this time it is in her existing analyst environment.

The customer never had to log in to FreightCortex.

Closing

If your platform isn’t callable by other agents, your platform isn’t future-proof. MCP is how you make that callable. It is a small engineering investment with very high leverage.

The post When Agents Call Agents: Why the MCP Server Matters in Freight appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

Building Multi-Agent Workflows on Databricks (mosaic AI Agent Framework)

Zorost Intelligence — Tue, 24 Feb 2026 09:00:00 +0000

Pull-quote: “Agents on the Lakehouse mean tools that read and write Delta tables, models that serve under MLflow, and evaluations that ship as Delta tables themselves.”

Why this matters

Agentic workflows are the next layer on the Lakehouse — agents that reason, plan, call tools, and produce verifiable artifacts. The Mosaic AI Agent Framework provides the runtime. The architectural decisions still belong to you.

Reference architecture

┌──────────────────────────────────────────────────────────────────┐
│                    AGENT (LangGraph / LlamaIndex / Custom)        │
│                                                                    │
│   Planner ──► Executor ──► Critic ──► Referee                    │
└─────────────────────┬────────────────────────────────────────────┘
                      │
                      ▼
       ┌──────────────────────────────┐
       │   Typed Tools                 │ ◄── Tool catalog
       │   - read Delta tables         │     (Unity Catalog)
       │   - write Delta tables        │
       │   - call MLflow models        │
       │   - call REST APIs            │
       └──────────────┬───────────────┘
                      │
                      ▼
       ┌──────────────────────────────┐
       │   Mosaic AI Model Serving     │
       │   - foundation models         │
       │   - fine-tuned models         │
       │   - per-agent traffic split   │
       └──────────────┬───────────────┘
                      │
                      ▼
       ┌──────────────────────────────┐
       │   Evaluations as Delta tables │ ◄── Versioned
       │   - golden datasets           │
       │   - regression suite          │
       │   - hallucination detection   │
       └──────────────────────────────┘

What “typed tools” means

Every tool has a JSON schema for inputs and outputs. The agent cannot call a tool with invalid inputs — the schema rejects the call. This eliminates an entire class of failure that plagues unconstrained agents.

What “evaluations as Delta tables” means

Evaluation results are stored as rows in versioned Delta tables. Each row is (agent_version, input, expected_output, actual_output, score, metadata). Regression analysis is a JOIN between two agent_version slices. New versions don’t promote unless they pass.

The agent / human contract

Where humans fit:

High-risk operations require human-in-the-loop checkpoints. Agents can propose; humans approve.
Critic disagreements with the executor route to humans when the referee cannot adjudicate.
Periodic spot-checks on agent decisions are scheduled into the evaluation harness.

This is not “manual override.” This is a designed-in contract about which decisions are agent-final and which are human-final.

Common architectural decisions

Decision	Default
Number of executors	One unless sub-goals are independent
Critic per executor or shared	Shared unless executors are heterogeneous
Memory model	Working memory in agent state; long-term memory in Delta table
Tool call timeout	30 s default, with retries on idempotent tools
Cost ceiling per session	Configurable; defaults to a hard cap

Closing

Multi-agent workflows on Databricks are productive when the framework is paired with discipline: typed tools, deterministic logging, evaluations as Delta tables, and a designed-in agent / human contract. The Mosaic AI Agent Framework is the runtime; the architecture is yours.

The post Building Multi-Agent Workflows on Databricks (mosaic AI Agent Framework) appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

Multi-Agent OSINT with a Critic and a Referee

Zorost Intelligence — Tue, 20 Jan 2026 09:00:00 +0000

Pull-quote: “Speed of agents matters less than honesty of agents. Critic and referee are how you build honesty into the swarm.”

Why this matters

The first wave of multi-agent OSINT systems was a swarm: ten agents reading the same inputs and producing summaries, which were then averaged. The result was confident-sounding mediocrity. The agents reinforced each other’s biases. The aggregator could not tell whether the consensus was real or echo.

The second wave adds structure to the swarm. Specifically, two roles that are missing in the naive design:

Critic — adversarial review. The critic’s job is to find the weakest link in the analysts’ reasoning and challenge it.
Referee — adjudicates when analysts disagree. The referee’s job is to apply explicit decision criteria and produce a final answer with explicit reasoning.

This is not a UI improvement. It is a structural change in what the system is.

Aquil’s swarm

Aquil runs a structured OSINT swarm with four roles:

Sourcers — discover and ingest open-source signals (news, public data, leaks, public records, satellite imagery sources where licensed)
Analysts — produce hypotheses, summarize evidence, and propose causal explanations
Critic — reviews analyst output for unsupported claims, missing evidence, plausible alternative explanations, and reasoning gaps
Referee — adjudicates when the analysts and the critic disagree, with explicit criteria

The critic is structurally different from the analysts: it does not propose new claims. Its only function is to challenge existing ones. The referee is structurally different again: it does not propose or challenge. It decides, with explicit reasoning that goes into the audit trail.

Causal-graph synthesis

On top of the swarm, Aquil produces a causal graph of the assessed situation — events as nodes, hypothesized causal relationships as edges, with confidence weights. The graph is the team’s shared mental model. It is updateable, queryable, and exportable.

A causal graph is not just a visualization. It is a structured commitment to what we think is going on. New evidence updates the graph; missing evidence flags weak edges; alternative hypotheses are visible as competing edges.

Why this works

The naive swarm fails because mediocre answers can hide behind a chorus. The structured swarm makes the chorus disagree on purpose, and then makes a referee adjudicate. The agents’ weaknesses are surfaced rather than averaged. The team gets a more honest answer.

Closing

Speed of agents matters less than honesty of agents. The critic and the referee are how you build honesty into the swarm. Aquil is structured around that thesis.

The post Multi-Agent OSINT with a Critic and a Referee appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

The Agent Factory: Planner, Executor, Critic, Referee

Zorost Intelligence — Tue, 23 Dec 2025 09:00:00 +0000

Pull-quote: “The four-role pattern is not an opinion. It’s the architecture every production multi-agent system converges on once it survives the first round of real users.”

Why this matters

Multi-agent AI starts as a clever idea (let agents talk to each other!) and dies in production as an unreliable mess (agents hallucinate to each other, disagreements never resolve, the audit trail is unreadable). The fix is structural: four roles, typed contracts, deterministic logs.

The four roles

Planner — decomposes the high-level goal into sub-goals and decides the sequence. Reads the task, the available tools, and the agent’s memory; emits a structured plan.
Executor(s) — carries out sub-goals. Calls tools. Returns structured outputs. Knows nothing about the high-level plan; just executes its assigned sub-goal honestly.
Critic — reviews each executor output adversarially. Looks for unsupported claims, broken citations, missed evidence, alternative interpretations. Does not propose new actions; only critiques.
Referee — adjudicates when the critic disagrees with the executor. Has explicit criteria. Produces the final decision with explicit reasoning.

Why this works

Planner / executor separation prevents the planner from drifting into execution and getting confused by tool errors.
Critic separation prevents the executors from grading their own work, which is a category error.
Referee separation prevents endless analyst-vs-critic loops.

Common variations

Single executor vs. multi-executor (parallelism). Parallel executors for independent sub-goals; serial for dependent ones.
Critic per executor or shared critic. Per-executor for specialized critique; shared for consistency across the run.
Hierarchical planning. A meta-planner produces a plan that includes “now plan this sub-task in detail” steps.

What we standardize

We standardize three things across every production agentic system:

Typed tool contracts — every tool has explicit input/output schemas. No improvisation.
Deterministic logs — every call (planner → executor, executor → tool, critic → executor) is logged with timestamps and parameters.
Evaluation harnesses — every system ships with a golden dataset, a regression suite, hallucination detection, and grounding scoring. New versions are evaluated before promotion.

Where we run this pattern

AeroFarr — multi-tool aviation analyst (planner / executor / critic over the prediction core, the cascade GNN, the causal engine, and the RAG corpus)
EvidAI — 4-model consensus screening with explicit critic and referee
FreightCortex — 16-tool AI freight analyst with planner / executor and a critic on report quality
Aquil — sourcers / analysts / critic / referee for OSINT
SPCio (with a manufacturing intelligence partner) — 8 specialized agents with a meta-coordinator

Closing

The four-role pattern is not an opinion. It is the architecture every production multi-agent system converges on once it survives the first round of real users. Skipping it is a tax you pay later.

The post The Agent Factory: Planner, Executor, Critic, Referee appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

Multi-Agent Quality: a New Architecture for the QMS

Zorost Intelligence — Tue, 25 Nov 2025 09:00:00 +0000

Pull-quote: “An SPC chart is not a decision. The decision is what to do about it. That’s where agents earn their keep.”

Why this matters

Quality management in regulated manufacturing has been essentially the same shape for thirty years: a set of forms (FMEA, Control Plans, MSA, NCR/CAPA, 8D), a statistical engine (control charts, capability indices, Gage R&R), and an audit trail. The forms get filled out, the charts get run, the audits pass. Operations engineers spend more time documenting than analyzing.

A multi-agent QMS is structurally different. It is not a forms engine with AI bolted on. It is an engine of cooperating agents that observe data, run analysis, recommend actions, and document what they did.

The agent architecture (eight specialized agents)

SPCio (co-developed with a manufacturing intelligence partner) ships eight specialized quality agents:

Process Monitor — watches SPC charts and triggers analysis on out-of-control patterns
Capability Analyst — runs Cp/Cpk/Pp/Ppk and interprets results in context
MSA Engineer — runs Gage R&R, ANOVA, and bias studies
FMEA Author — drafts and updates Failure Mode and Effects Analysis with severity / occurrence / detection scoring
Control Plan Author — drafts and updates Control Plans tied to FMEA and PPAP
8D Investigator — runs Eight Disciplines problem-solving with root-cause analysis
NCR / CAPA Coordinator — manages non-conformance reports and corrective/preventive actions through closure
APQP Coordinator — orchestrates Advanced Product Quality Planning across phase gates

Each agent has a typed tool contract (inputs, outputs, side effects) and a deterministic call log. Agent-to-agent communication is mediated and recorded.

Why the architecture works

The classical QMS treats every form as an isolated artifact. The multi-agent QMS treats them as nodes in a graph: an FMEA refers to a Control Plan, which refers to an MSA, which refers to historical SPC data, which refers to the current production run. When an out-of-control pattern emerges on a chart, the Process Monitor doesn’t just raise an alert — it asks the Capability Analyst whether the process is still capable, asks the FMEA Author whether the relevant failure mode is documented, and asks the 8D Investigator to start a structured investigation if the pattern persists.

The result is a system that continuously maintains the QMS rather than waiting for the team to maintain it during audit prep.

Tool counts and the RAG corpus

SPCio’s eight agents share a tool catalog of fifty-seven callable tools ranging from statistical computations to chart generation to FMEA cross-referencing to PPAP documentation. The RAG layer is built over a 765,000-chunk quality knowledge corpus covering IATF 16949, ISO 9001, AIAG core tools, and customer-specific quality manuals.

Closing

A multi-agent QMS is not a UI improvement on the old model. It is a different model. The implication for quality engineers is significant: less time documenting, more time analyzing — and a continuously updated system that audits don’t catch up to, because it never falls behind.

The post Multi-Agent Quality: a New Architecture for the QMS appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

An AI Freight Analyst with 16 Tools

Zorost Intelligence — Tue, 11 Nov 2025 09:00:00 +0000

Pull-quote: “The AI analyst is not a chatbot bolted on the side. It is the center of the platform.”

Why this matters

Most freight intelligence platforms have followed the same pattern with generative AI: keep the existing dashboards, add a chatbot in the corner, ship a press release. The chatbot answers FAQ-class questions and sometimes summarizes a dashboard. Senior freight analysts ignore it.

FreightCortex is built around the AI analyst, not the other way around. The analyst is a multi-tool agent with sixteen callable tools that can pull data, run statistical tests, run simulations, and produce structured outputs. It is more like a junior analyst with access to the full platform than like a chatbot.

The 16 tools

#	Tool	What it does
1	`query_corridor_metrics`	Lane-level KPIs (cost, transit time, capacity, on-time %)
2	`query_carrier_metrics`	Carrier-level KPIs and ranking
3	`query_origin_destination_flows`	OD-pair flows with filters
4	`compute_anomaly_score`	Z-score / isolation forest / CUSUM on a metric series
5	`run_capacity_simulation`	What-if capacity reduction or expansion
6	`run_demand_simulation`	What-if demand shock scenarios
7	`run_disruption_simulation`	What-if disruption (port closure, weather, strike)
8	`run_routing_simulation`	Reroute optimization under constraints
9	`run_modal_shift_simulation`	Mode-shift impact (truck ↔ rail ↔ intermodal)
10	`run_emissions_simulation`	CO₂ impact under scenarios
11	`run_network_stress_test`	Network-wide stress scenarios
12	`compute_shortest_path`	Multi-modal shortest path
13	`compute_betweenness`	Node centrality
14	`compute_communities`	Network communities
15	`generate_report`	Compose structured report from analytical session
16	`generate_chart`	Render a specific chart type with provided data

Each tool is a typed contract: inputs, outputs, and side effects are documented. Every call is logged with the requesting question, the parameters, the result, and timestamps.

Why typed tools matter

The single most important architectural decision in agent design is whether your tools have contracts. Untyped tools — give the model a vague description and let it improvise — are unreliable. Typed tools — with explicit input schemas, output schemas, and validation — are reliable.

FreightCortex’s analyst will not call a tool with an invalid input. The schema rejects the call before it reaches the data layer. That eliminates an entire class of failure that plagues unconstrained agents.

What this lets analysts do

A typical session: an analyst asks “what’s driving the cost increase on the Atlanta-Dallas corridor over the last quarter?” The analyst:

Calls query_corridor_metrics for Atlanta-Dallas with a 90-day window
Calls compute_anomaly_score on the cost series
Calls query_carrier_metrics to see which carriers’ rates moved
Calls run_capacity_simulation to test whether the increase tracks capacity changes
Generates a structured report with charts

This is fifteen minutes of senior-analyst work. With FreightCortex, it is one question and a structured answer with citations.

Closing

A chatbot bolted on a dashboard is a feature. An AI analyst at the center of the platform is a product. The difference shows up the moment senior analysts compare them in real engagements.

The post An AI Freight Analyst with 16 Tools appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

Multi-Agent Consensus for Systematic Literature Review

Zorost Intelligence — Tue, 04 Nov 2025 09:00:00 +0000

Pull-quote: “If four independent reasoners agree, the inclusion decision is high-confidence. If they disagree, the question goes to a human. That’s the design contract.”

Why this matters

Systematic literature reviews underpin regulatory submissions, clinical practice guidelines, and HTA decisions. Doing them well is expensive and slow — typically 4–6 months and a six-figure investment for a single review. Doing them badly is dangerous.

The first wave of LLM-assisted screening was a single model judging each title/abstract against the inclusion criteria. It was faster than manual review. It was no more accurate. In some cases, it was less accurate, because a single model has systematic biases that a human reviewer doesn’t share.

What multi-agent consensus does

EvidAI runs every screening decision through four independent LLMs, each with a structured prompt that includes the protocol’s inclusion and exclusion criteria, a brief excerpt from the abstract, and a request for explicit reasoning.

The four models vote. Three patterns emerge:

Pattern	Frequency	Action
4–0 unanimous include	~78%	Auto-include
4–0 unanimous exclude	~13%	Auto-exclude
3–1 majority	~6%	Flag for human reviewer with explanations
2–2 split	~2%	Mandatory human reviewer with adjudication
Disagreement on reasoning	varies	Flag for human reviewer regardless of outcome

(Frequencies are typical for a well-designed protocol; they vary with topic.)

Why the design works

The key insight is that independent errors are uncorrelated. Different LLMs have different systematic biases — different training data, different RLHF preferences, different prompt sensitivities. When four independent reasoners agree, the marginal probability of error drops sharply. When they disagree, the model designers’ expected behavior is reproducing the disagreement that human reviewers would have had — which is exactly what should be escalated.

Single-model screening hides disagreement. Multi-agent consensus surfaces it.

Auditability

Every screening decision is stored as a row with: paper ID, protocol version, model identifiers, raw model outputs, parsed decisions, the reason for inclusion/exclusion in each model’s words, the consensus result, and (if applicable) the human reviewer’s adjudication. The complete chain is replayable by an auditor and reproducible by a successor team.

This is the difference between an AI tool that speeds up the SLR process and one that preserves the audit standard it requires.

Closing

The multi-agent consensus pattern is the right answer for any high-stakes screening problem where accountability and auditability matter. EvidAI applies it to systematic reviews. The same pattern transfers cleanly to compliance screening, regulatory document review, due diligence, and grant assessment.

The post Multi-Agent Consensus for Systematic Literature Review appeared first on Zorost Intelligence | AI, Cloud & Data Experts.