Agentic AI Archives - Zorost Intelligence | AI, Cloud & Data Experts

Air-Gapped Agentic Stacks for Sovereign Environments

Zorost Intelligence — Tue, 14 Apr 2026 09:00:00 +0000

Pull-quote: “Sovereign AI is not ‘AI minus features.’ It is ‘AI plus discipline.'”

Why this matters

Some federal mission environments cannot accept internet egress. Some cannot accept any data leaving the customer boundary. Some cannot accept models that the customer cannot inspect end-to-end. Cloud-only AI vendors do not serve these environments.

The good news: air-gapped agentic AI is operationally feasible in 2026. The bad news: it requires engineering discipline that most vendors don’t have.

The reference stack (engineering view)

Local LLM serving. Open-weights models (Llama 3.x, Qwen 2.5, Mistral, Phi-4, Gemma 3, code-tuned variants) served via Ollama, vLLM, or llama.cpp on customer hardware.
Local embeddings. Open-source embedding models on the same stack.
Local vector database. pgvector, Weaviate, or Qdrant on a private subnet.
Local model registry. MLflow Model Registry running inside the boundary.
Local RAG pipeline. Ingestion, chunking, embedding, retrieval, re-ranking, generation — all inside the boundary.
Local evaluation harness. Golden datasets, regression suites, hallucination detection, grounding scoring — version-controlled and runnable inside the boundary.
Local observability. Grafana, Prometheus, Loki running inside the boundary.
Local update pipeline. Models, weights, and corpus updates delivered as signed bundles via approved transfer.

The reference stack (governance view)

Documented model selection — which model, which version, which quantization, why
Documented evaluation — what the golden dataset is, what it tests, what passing looks like
Documented update procedure — who signs the update bundle, who imports it, who validates it post-import
Documented retirement — when and why a model is retired
Audit trail — every decision the system makes is logged with model version, prompt, output, and grounding evidence

Trade-offs vs. cloud

Latency. Comparable for the smaller models; better for chained calls (no network round-trip).
Capability. Behind the absolute frontier of closed-source models. Open-weights models in 2026 are excellent but not at parity with the strongest closed-source options.
Cost. Higher up-front (hardware), lower over time (no per-token bills).
Update cadence. Slower because updates must clear the boundary.
Evaluation discipline. Tighter, because there is no vendor evaluation to lean on.
Sovereignty. Complete. The customer owns the stack end-to-end.

Where it fits in federal posture

Air-gapped agentic stacks fit:

Classified or otherwise sensitive environments without internet egress
Mission environments where data cannot leave the customer boundary
Programs where the agency requires end-to-end inspection and audit of the AI stack

It does not fit:

Environments where the very latest closed-source model capability is required and the data sensitivity allows cloud
Environments where rapid model iteration is more important than sovereignty

Closing

Sovereign agentic AI is real. It requires engineering discipline. We’ve built it for our manufacturing-quality platform (with a partner) and we apply the same discipline to federal mission environments. The deployment shape is different from cloud. The trade-offs are real. For the customers who need it, no other shape fits.

The post Air-Gapped Agentic Stacks for Sovereign Environments appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

When Agents Call Agents: Why the MCP Server Matters in Freight

Zorost Intelligence — Tue, 24 Feb 2026 09:00:00 +0000

Pull-quote: “If your platform isn’t callable by other agents, your platform isn’t future-proof.”

Why this matters

The next generation of enterprise software is being shaped by a simple fact: users have agents now. Claude Desktop, custom internal agents, vendor-provided agents — they’re all going to call your platform. Either they call it through your REST API (and the agent has to know your URL structure, your authentication, your error semantics) or they call it through a standard protocol.

That standard is Model Context Protocol (MCP).

What MCP is

MCP is an open protocol developed by Anthropic and adopted across the agent ecosystem. It defines how an AI agent describes its tools, how a host (the agent’s runtime) discovers and calls those tools, and how results are returned. The result is a clean separation: tools are advertised, agents discover and call them, and you can swap tool servers without touching the agent.

For FreightCortex, the MCP server is a thin layer that exposes our 16 tools using the protocol. An external agent — a customer’s internal Claude Desktop, an OEM’s analytics chatbot, or a third-party tool — can connect to our MCP endpoint and use FreightCortex like a native tool.

What this unlocks

Three things:

Native callability from any MCP-compatible agent. Customers do not need to write custom integrations. Their agent just connects to our MCP server.
Composability with other tools. A customer agent can use FreightCortex tools alongside their own internal tools. The agent decides when to call which.
Future-proofing. As the agent ecosystem grows, MCP-compatible platforms are accessible by default. REST-only platforms have to be manually integrated, one customer at a time.

What it requires

Three engineering investments:

Tool contracts — every tool we want to expose has a typed schema. (We already had this.)
The MCP server itself — a thin transport layer over those tools.
Authentication and rate limiting — MCP doesn’t replace your existing auth; it sits on top of it.

A concrete example

An analyst is using Claude Desktop on her workstation. She asks “what’s driving the cost increase on the Atlanta–Dallas corridor?” Claude knows about the FreightCortex MCP server (configured once per workstation) and decides to use it. It calls query_corridor_metrics, compute_anomaly_score, query_carrier_metrics, and run_capacity_simulation — and produces an answer with the same structure as the answer it would have given inside the FreightCortex web app, except this time it is in her existing analyst environment.

The customer never had to log in to FreightCortex.

Closing

If your platform isn’t callable by other agents, your platform isn’t future-proof. MCP is how you make that callable. It is a small engineering investment with very high leverage.

The post When Agents Call Agents: Why the MCP Server Matters in Freight appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

Building Multi-Agent Workflows on Databricks (mosaic AI Agent Framework)

Zorost Intelligence — Tue, 24 Feb 2026 09:00:00 +0000

Pull-quote: “Agents on the Lakehouse mean tools that read and write Delta tables, models that serve under MLflow, and evaluations that ship as Delta tables themselves.”

Why this matters

Agentic workflows are the next layer on the Lakehouse — agents that reason, plan, call tools, and produce verifiable artifacts. The Mosaic AI Agent Framework provides the runtime. The architectural decisions still belong to you.

Reference architecture

┌──────────────────────────────────────────────────────────────────┐
│                    AGENT (LangGraph / LlamaIndex / Custom)        │
│                                                                    │
│   Planner ──► Executor ──► Critic ──► Referee                    │
└─────────────────────┬────────────────────────────────────────────┘
                      │
                      ▼
       ┌──────────────────────────────┐
       │   Typed Tools                 │ ◄── Tool catalog
       │   - read Delta tables         │     (Unity Catalog)
       │   - write Delta tables        │
       │   - call MLflow models        │
       │   - call REST APIs            │
       └──────────────┬───────────────┘
                      │
                      ▼
       ┌──────────────────────────────┐
       │   Mosaic AI Model Serving     │
       │   - foundation models         │
       │   - fine-tuned models         │
       │   - per-agent traffic split   │
       └──────────────┬───────────────┘
                      │
                      ▼
       ┌──────────────────────────────┐
       │   Evaluations as Delta tables │ ◄── Versioned
       │   - golden datasets           │
       │   - regression suite          │
       │   - hallucination detection   │
       └──────────────────────────────┘

What “typed tools” means

Every tool has a JSON schema for inputs and outputs. The agent cannot call a tool with invalid inputs — the schema rejects the call. This eliminates an entire class of failure that plagues unconstrained agents.

What “evaluations as Delta tables” means

Evaluation results are stored as rows in versioned Delta tables. Each row is (agent_version, input, expected_output, actual_output, score, metadata). Regression analysis is a JOIN between two agent_version slices. New versions don’t promote unless they pass.

The agent / human contract

Where humans fit:

High-risk operations require human-in-the-loop checkpoints. Agents can propose; humans approve.
Critic disagreements with the executor route to humans when the referee cannot adjudicate.
Periodic spot-checks on agent decisions are scheduled into the evaluation harness.

This is not “manual override.” This is a designed-in contract about which decisions are agent-final and which are human-final.

Common architectural decisions

Decision	Default
Number of executors	One unless sub-goals are independent
Critic per executor or shared	Shared unless executors are heterogeneous
Memory model	Working memory in agent state; long-term memory in Delta table
Tool call timeout	30 s default, with retries on idempotent tools
Cost ceiling per session	Configurable; defaults to a hard cap

Closing

Multi-agent workflows on Databricks are productive when the framework is paired with discipline: typed tools, deterministic logging, evaluations as Delta tables, and a designed-in agent / human contract. The Mosaic AI Agent Framework is the runtime; the architecture is yours.

The post Building Multi-Agent Workflows on Databricks (mosaic AI Agent Framework) appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

The Agent Factory: Planner, Executor, Critic, Referee

Zorost Intelligence — Tue, 23 Dec 2025 09:00:00 +0000

Pull-quote: “The four-role pattern is not an opinion. It’s the architecture every production multi-agent system converges on once it survives the first round of real users.”

Why this matters

Multi-agent AI starts as a clever idea (let agents talk to each other!) and dies in production as an unreliable mess (agents hallucinate to each other, disagreements never resolve, the audit trail is unreadable). The fix is structural: four roles, typed contracts, deterministic logs.

The four roles

Planner — decomposes the high-level goal into sub-goals and decides the sequence. Reads the task, the available tools, and the agent’s memory; emits a structured plan.
Executor(s) — carries out sub-goals. Calls tools. Returns structured outputs. Knows nothing about the high-level plan; just executes its assigned sub-goal honestly.
Critic — reviews each executor output adversarially. Looks for unsupported claims, broken citations, missed evidence, alternative interpretations. Does not propose new actions; only critiques.
Referee — adjudicates when the critic disagrees with the executor. Has explicit criteria. Produces the final decision with explicit reasoning.

Why this works

Planner / executor separation prevents the planner from drifting into execution and getting confused by tool errors.
Critic separation prevents the executors from grading their own work, which is a category error.
Referee separation prevents endless analyst-vs-critic loops.

Common variations

Single executor vs. multi-executor (parallelism). Parallel executors for independent sub-goals; serial for dependent ones.
Critic per executor or shared critic. Per-executor for specialized critique; shared for consistency across the run.
Hierarchical planning. A meta-planner produces a plan that includes “now plan this sub-task in detail” steps.

What we standardize

We standardize three things across every production agentic system:

Typed tool contracts — every tool has explicit input/output schemas. No improvisation.
Deterministic logs — every call (planner → executor, executor → tool, critic → executor) is logged with timestamps and parameters.
Evaluation harnesses — every system ships with a golden dataset, a regression suite, hallucination detection, and grounding scoring. New versions are evaluated before promotion.

Where we run this pattern

AeroFarr — multi-tool aviation analyst (planner / executor / critic over the prediction core, the cascade GNN, the causal engine, and the RAG corpus)
EvidAI — 4-model consensus screening with explicit critic and referee
FreightCortex — 16-tool AI freight analyst with planner / executor and a critic on report quality
Aquil — sourcers / analysts / critic / referee for OSINT
SPCio (with a manufacturing intelligence partner) — 8 specialized agents with a meta-coordinator

Closing

The four-role pattern is not an opinion. It is the architecture every production multi-agent system converges on once it survives the first round of real users. Skipping it is a tax you pay later.

The post The Agent Factory: Planner, Executor, Critic, Referee appeared first on Zorost Intelligence | AI, Cloud & Data Experts.