<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Zorost Intelligence, Author at Zorost Intelligence | AI, Cloud &amp; Data Experts</title>
	<atom:link href="https://zorost.com/author/mehdih/feed/" rel="self" type="application/rss+xml" />
	<link>https://zorost.com/author/mehdih/</link>
	<description>Production AI systems for aviation, manufacturing, pharma, government, finance, freight, and geopolitical intelligence.</description>
	<lastBuildDate>Wed, 20 May 2026 18:52:38 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=7.0</generator>

<image>
	<url>https://zorost.com/wp-content/uploads/2025/08/ZOROST-Intel-Logo3_512-150x150.png</url>
	<title>Zorost Intelligence, Author at Zorost Intelligence | AI, Cloud &amp; Data Experts</title>
	<link>https://zorost.com/author/mehdih/</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">81719879</site>	<item>
		<title>Databricks Cost Optimization &#038; Finops: Where the Real Savings Are</title>
		<link>https://zorost.com/databricks-cost-optimization-finops/</link>
		
		<dc:creator><![CDATA[Zorost Intelligence]]></dc:creator>
		<pubDate>Tue, 21 Apr 2026 09:00:00 +0000</pubDate>
				<category><![CDATA[Databricks Modernization]]></category>
		<category><![CDATA[Cost Optimization]]></category>
		<category><![CDATA[Databricks]]></category>
		<category><![CDATA[FinOps]]></category>
		<category><![CDATA[Performance Tuning]]></category>
		<guid isPermaLink="false">https://zorost.com/databricks-cost-optimization-finops/</guid>

					<description><![CDATA[<p>A practical FinOps playbook for Databricks. Cluster types, file compaction, caching, serverless, and BI rationalization — with realistic savings ranges.</p>
<p>The post <a href="https://zorost.com/databricks-cost-optimization-finops/">Databricks Cost Optimization &#038; Finops: Where the Real Savings Are</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></description>
										<content:encoded><![CDATA[<blockquote>
<p><strong>Pull-quote:</strong> &#8220;Cost optimization is not a one-time project. It&#8217;s a recurring discipline. The tooling is there. The discipline is the ask.&#8221;</p>
</blockquote>
<h4>Why this matters</h4>
<p>Most Databricks deployments have 30–60% slack in their spend within twelve months of go-live. Some of it is unavoidable (early-stage discovery). Some of it is technical (file layout, cluster sizing). Most of it is organizational (no cost ownership, no tagging, no review cadence).</p>
<h4>Where the real savings are</h4>
<table>
<thead>
<tr>
<th>Lever</th>
<th>Typical impact</th>
</tr>
</thead>
<tbody>
<tr>
<td>Right-sized cluster types (Photon, autoscaling, spot)</td>
<td>15–30%</td>
</tr>
<tr>
<td>Job orchestration (concurrent runs, dependencies, retries)</td>
<td>5–15%</td>
</tr>
<tr>
<td>File compaction (<code>OPTIMIZE</code>, <code>Z-ORDER</code>, <code>liquid clustering</code>)</td>
<td>10–25% on read-heavy workloads</td>
</tr>
<tr>
<td>Caching strategies (Delta cache, query cache)</td>
<td>5–15%</td>
</tr>
<tr>
<td>Workload migration to Serverless SQL where appropriate</td>
<td>10–25%</td>
</tr>
<tr>
<td>BI semantic-model rationalization</td>
<td>10–20% on Power BI / Tableau queries</td>
</tr>
<tr>
<td>Autoscaling thresholds</td>
<td>5–10%</td>
</tr>
<tr>
<td>Tombstone management (<code>VACUUM</code>)</td>
<td>Cleanup, not a direct saving, but sustainable</td>
</tr>
</tbody>
</table>
<blockquote>
<p>Ranges are typical for engagements where the team has not previously focused on cost. Mature deployments have less to find.</p>
</blockquote>
<h4>Tagging and ownership — the prerequisite</h4>
<p>Without tagging, you can&#8217;t optimize. Required tags:</p>
<ul>
<li><code>cost_center</code></li>
<li><code>environment</code> (dev / stage / prod)</li>
<li><code>owner</code> (team or person)</li>
<li><code>workload</code> (training / serving / ETL / BI / ad-hoc)</li>
</ul>
<p>These flow into the <strong>system tables</strong> for cost reporting (<code>system.billing.usage</code>).</p>
<h4>The audit, in twelve hours</h4>
<p>A typical audit takes about twelve hours of senior engineering time:</p>
<ol>
<li>Pull <code>system.billing.usage</code> for the last 90 days, joined with cluster metadata</li>
<li>Identify the top 10 jobs by cost</li>
<li>For each, evaluate: is the cluster the right type? Is autoscaling tuned? Are files compacted? Is the workload running at the right cadence?</li>
<li>Identify candidates for serverless migration</li>
<li>Identify candidates for materialized view replacement</li>
<li>Produce a prioritized list with estimated savings</li>
</ol>
<p>Most teams find five to ten actions that together deliver 20–40% savings.</p>
<h4>Common findings</h4>
<ul>
<li>A nightly batch job using a high-end cluster size when a Photon-enabled smaller cluster would do</li>
<li>A streaming pipeline running with a cluster sized for peak when traffic is bimodal</li>
<li>A Power BI model importing 80% of data that nobody queries</li>
<li>A <code>SELECT *</code> materialized in a downstream view, doubling storage cost on a hot dataset</li>
<li>An ad-hoc cluster left running over a weekend</li>
</ul>
<h4>Cost ownership cadence</h4>
<p>The discipline that holds savings: monthly cost review with the data leadership and the FinOps lead. Each owner explains anomalies. Tags get fixed. Wasteful patterns get retired.</p>
<h4>Closing</h4>
<p>Cost optimization on Databricks is not a one-time project. It is a recurring discipline backed by tagging, system tables, and a monthly review. The platform tooling is there. The discipline is the ask.</p>
<hr>
<p>The post <a href="https://zorost.com/databricks-cost-optimization-finops/">Databricks Cost Optimization &#038; Finops: Where the Real Savings Are</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">24309</post-id>	</item>
		<item>
		<title>Air-Gapped Agentic Stacks for Sovereign Environments</title>
		<link>https://zorost.com/air-gapped-agentic-stacks-sovereign/</link>
		
		<dc:creator><![CDATA[Zorost Intelligence]]></dc:creator>
		<pubDate>Tue, 14 Apr 2026 09:00:00 +0000</pubDate>
				<category><![CDATA[Government & Federal]]></category>
		<category><![CDATA[Agentic AI]]></category>
		<category><![CDATA[Air-Gapped]]></category>
		<category><![CDATA[FedRAMP]]></category>
		<category><![CDATA[Local LLM]]></category>
		<category><![CDATA[Ollama]]></category>
		<category><![CDATA[vLLM]]></category>
		<guid isPermaLink="false">https://zorost.com/air-gapped-agentic-stacks-sovereign/</guid>

					<description><![CDATA[<p>Sovereign-AI deployment is now operationally feasible. Here is what an air-gapped agentic stack looks like, what it costs, and where it fits in federal mission environments.</p>
<p>The post <a href="https://zorost.com/air-gapped-agentic-stacks-sovereign/">Air-Gapped Agentic Stacks for Sovereign Environments</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></description>
										<content:encoded><![CDATA[<blockquote>
<p><strong>Pull-quote:</strong> &#8220;Sovereign AI is not &#8216;AI minus features.&#8217; It is &#8216;AI plus discipline.'&#8221;</p>
</blockquote>
<h4>Why this matters</h4>
<p>Some federal mission environments cannot accept internet egress. Some cannot accept any data leaving the customer boundary. Some cannot accept models that the customer cannot inspect end-to-end. Cloud-only AI vendors do not serve these environments.</p>
<p>The good news: <strong>air-gapped agentic AI is operationally feasible in 2026</strong>. The bad news: it requires engineering discipline that most vendors don&#8217;t have.</p>
<h4>The reference stack (engineering view)</h4>
<ul>
<li><strong>Local LLM serving.</strong> Open-weights models (Llama 3.x, Qwen 2.5, Mistral, Phi-4, Gemma 3, code-tuned variants) served via Ollama, vLLM, or llama.cpp on customer hardware.</li>
<li><strong>Local embeddings.</strong> Open-source embedding models on the same stack.</li>
<li><strong>Local vector database.</strong> pgvector, Weaviate, or Qdrant on a private subnet.</li>
<li><strong>Local model registry.</strong> MLflow Model Registry running inside the boundary.</li>
<li><strong>Local RAG pipeline.</strong> Ingestion, chunking, embedding, retrieval, re-ranking, generation — all inside the boundary.</li>
<li><strong>Local evaluation harness.</strong> Golden datasets, regression suites, hallucination detection, grounding scoring — version-controlled and runnable inside the boundary.</li>
<li><strong>Local observability.</strong> Grafana, Prometheus, Loki running inside the boundary.</li>
<li><strong>Local update pipeline.</strong> Models, weights, and corpus updates delivered as signed bundles via approved transfer.</li>
</ul>
<h4>The reference stack (governance view)</h4>
<ul>
<li><strong>Documented model selection</strong> — which model, which version, which quantization, why</li>
<li><strong>Documented evaluation</strong> — what the golden dataset is, what it tests, what passing looks like</li>
<li><strong>Documented update procedure</strong> — who signs the update bundle, who imports it, who validates it post-import</li>
<li><strong>Documented retirement</strong> — when and why a model is retired</li>
<li><strong>Audit trail</strong> — every decision the system makes is logged with model version, prompt, output, and grounding evidence</li>
</ul>
<h4>Trade-offs vs. cloud</h4>
<ul>
<li><strong>Latency.</strong> Comparable for the smaller models; better for chained calls (no network round-trip).</li>
<li><strong>Capability.</strong> Behind the absolute frontier of closed-source models. Open-weights models in 2026 are excellent but not at parity with the strongest closed-source options.</li>
<li><strong>Cost.</strong> Higher up-front (hardware), lower over time (no per-token bills).</li>
<li><strong>Update cadence.</strong> Slower because updates must clear the boundary.</li>
<li><strong>Evaluation discipline.</strong> Tighter, because there is no vendor evaluation to lean on.</li>
<li><strong>Sovereignty.</strong> Complete. The customer owns the stack end-to-end.</li>
</ul>
<h4>Where it fits in federal posture</h4>
<p>Air-gapped agentic stacks fit:</p>
<ul>
<li>Classified or otherwise sensitive environments without internet egress</li>
<li>Mission environments where data cannot leave the customer boundary</li>
<li>Programs where the agency requires end-to-end inspection and audit of the AI stack</li>
</ul>
<p>It does not fit:</p>
<ul>
<li>Environments where the very latest closed-source model capability is required and the data sensitivity allows cloud</li>
<li>Environments where rapid model iteration is more important than sovereignty</li>
</ul>
<h4>Closing</h4>
<p>Sovereign agentic AI is real. It requires engineering discipline. We&#8217;ve built it for our manufacturing-quality platform (with a partner) and we apply the same discipline to federal mission environments. The deployment shape is different from cloud. The trade-offs are real. For the customers who need it, no other shape fits.</p>
<hr>
<p>The post <a href="https://zorost.com/air-gapped-agentic-stacks-sovereign/">Air-Gapped Agentic Stacks for Sovereign Environments</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">24299</post-id>	</item>
		<item>
		<title>Power Bi Direct Lake on Databricks SQL: a Modernization Playbook</title>
		<link>https://zorost.com/power-bi-direct-lake-databricks-sql/</link>
		
		<dc:creator><![CDATA[Zorost Intelligence]]></dc:creator>
		<pubDate>Tue, 31 Mar 2026 09:00:00 +0000</pubDate>
				<category><![CDATA[Databricks Modernization]]></category>
		<category><![CDATA[BI]]></category>
		<category><![CDATA[Databricks SQL]]></category>
		<category><![CDATA[Direct Lake]]></category>
		<category><![CDATA[Power BI]]></category>
		<category><![CDATA[Semantic Model]]></category>
		<guid isPermaLink="false">https://zorost.com/power-bi-direct-lake-databricks-sql/</guid>

					<description><![CDATA[<p>Migrate Power BI semantic models from import / DirectQuery to Direct Lake on Databricks SQL. Performance, governance, and migration patterns.</p>
<p>The post <a href="https://zorost.com/power-bi-direct-lake-databricks-sql/">Power Bi Direct Lake on Databricks SQL: a Modernization Playbook</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></description>
										<content:encoded><![CDATA[<blockquote>
<p><strong>Pull-quote:</strong> &#8220;Direct Lake is not faster DirectQuery. It is a different mode that eliminates a class of refreshes that should never have existed.&#8221;</p>
</blockquote>
<h4>Why this matters</h4>
<p>Power BI has been deployed in three modes for a decade: <strong>Import</strong>, <strong>DirectQuery</strong>, and <strong>Composite</strong>. Each has trade-offs. Import is fast but stale; DirectQuery is fresh but slow; Composite is a compromise. Direct Lake — Power BI talking directly to Delta tables in Databricks SQL — is a fourth mode that eliminates a class of refresh problems that should never have existed.</p>
<h4>The four modes</h4>
<table>
<thead>
<tr>
<th>Mode</th>
<th>Freshness</th>
<th>Performance</th>
<th>When to use</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Import</strong></td>
<td>Stale until next refresh</td>
<td>Fast</td>
<td>Small models, infrequent updates</td>
</tr>
<tr>
<td><strong>DirectQuery</strong></td>
<td>Live</td>
<td>Slow on large fact tables</td>
<td>Real-time-ish dashboards over modest volume</td>
</tr>
<tr>
<td><strong>Composite</strong></td>
<td>Mixed</td>
<td>Mixed</td>
<td>Hybrid scenarios</td>
</tr>
<tr>
<td><strong>Direct Lake</strong></td>
<td>Live (on Delta)</td>
<td>Fast</td>
<td>Lakehouse-native consumption</td>
</tr>
</tbody>
</table>
<h4>Why Direct Lake works</h4>
<p>Direct Lake reads Delta files directly into Power BI&#8217;s analytics engine without import. There is no refresh schedule. There is no DirectQuery overhead. The semantic model points at Unity Catalog tables and the engine handles the rest.</p>
<p>The conditions for it to work:</p>
<ul>
<li>Source data must be in Delta format</li>
<li>Tables must be in Unity Catalog</li>
<li>Model size must fit in the engine&#8217;s memory budget for the SKU</li>
<li>DAX must be Direct Lake-compatible (most is; some isn&#8217;t)</li>
</ul>
<h4>Migration playbook</h4>
<table>
<thead>
<tr>
<th>Phase</th>
<th>Output</th>
</tr>
</thead>
<tbody>
<tr>
<td>Discovery</td>
<td>Catalog of existing Power BI models · usage telemetry</td>
</tr>
<tr>
<td>Source landing in Delta</td>
<td>Sources moved to Delta tables in Unity Catalog</td>
</tr>
<tr>
<td>Semantic model rebuild</td>
<td>New model on Direct Lake</td>
</tr>
<tr>
<td>Visual rebuild</td>
<td>Reports and dashboards rebuilt against the new model</td>
</tr>
<tr>
<td>Parallel run</td>
<td>Old and new models in production simultaneously</td>
</tr>
<tr>
<td>Cutover</td>
<td>Old retired</td>
</tr>
</tbody>
</table>
<h4>Governance benefits</h4>
<ul>
<li>Row and column security live in the <strong>dynamic views</strong> in Unity Catalog, not in the semantic model. One source of truth for security.</li>
<li>Lineage covers the entire path from source through Delta to Power BI.</li>
<li>Performance tuning happens at the Delta layer (liquid clustering, OPTIMIZE, Z-order) and benefits every consumer, not just Power BI.</li>
</ul>
<h4>Closing</h4>
<p>Direct Lake is the modern Power BI mode for Lakehouse-native consumption. The migration is methodical, the trade-offs are clear, and the result is faster, fresher dashboards with simpler operations.</p>
<hr>
<p>The post <a href="https://zorost.com/power-bi-direct-lake-databricks-sql/">Power Bi Direct Lake on Databricks SQL: a Modernization Playbook</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">24308</post-id>	</item>
		<item>
		<title>Calibration-First AI for Federal Decision Support</title>
		<link>https://zorost.com/calibration-first-ai-federal/</link>
		
		<dc:creator><![CDATA[Zorost Intelligence]]></dc:creator>
		<pubDate>Tue, 24 Mar 2026 09:00:00 +0000</pubDate>
				<category><![CDATA[Government & Federal]]></category>
		<category><![CDATA[Calibration]]></category>
		<category><![CDATA[Conformal Prediction]]></category>
		<category><![CDATA[ECE]]></category>
		<category><![CDATA[Governance]]></category>
		<category><![CDATA[LACP]]></category>
		<category><![CDATA[NIST AI RMF]]></category>
		<guid isPermaLink="false">https://zorost.com/calibration-first-ai-federal/</guid>

					<description><![CDATA[<p>Federal decision support cannot run on headline accuracy. Calibration and conformal prediction are the standards a procurement officer should require — and the standards we hold ourselves to.</p>
<p>The post <a href="https://zorost.com/calibration-first-ai-federal/">Calibration-First AI for Federal Decision Support</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></description>
										<content:encoded><![CDATA[<blockquote>
<p><strong>Pull-quote:</strong> &#8220;Federal procurement should require calibration metrics in every AI proposal. Anything less is buying a black box.&#8221;</p>
</blockquote>
<h4>Why this matters</h4>
<p>Federal decision support runs on AI now. Risk scoring, fraud detection, predictive maintenance, safety analysis, mission planning — every category has at least one AI vendor pitching the agency. The procurement question is: <em>how does an agency tell the credible vendors from the rest?</em></p>
<p>Headline accuracy doesn&#8217;t help. Every vendor claims high accuracy. The number doesn&#8217;t translate into operational trust.</p>
<p>The right standard is <strong>calibration</strong> — and <strong>conformal prediction</strong> for individual uncertainty.</p>
<h4>Calibration as a procurement requirement</h4>
<p><strong>Expected Calibration Error (ECE)</strong> is the standard metric. Below 0.02 is excellent. Below 0.01 is very good. The metric is widely adopted in academic ML evaluation and is the right floor for any high-stakes federal AI use.</p>
<p>A procurement RFP for an AI system should require:</p>
<ul>
<li>ECE on a documented holdout slice of representative size</li>
<li>Reliability diagrams showing calibration across the full probability range</li>
<li>Sensitivity analysis on how calibration degrades under common distribution shifts (seasonal, regime change, missing data)</li>
<li>A monitoring plan for calibration drift in production</li>
</ul>
<p>Every vendor that ships calibrated models can produce this. Every vendor that ships only headline accuracy will struggle to.</p>
<h4>Conformal prediction as the second standard</h4>
<p>Calibration tells you the <em>average</em> probability is honest. Conformal prediction tells you the <em>individual</em> uncertainty is honest. <strong>Locally Adaptive Conformal Prediction (LACP)</strong> produces distribution-free prediction intervals — when the model says &#8220;between 18 and 47 minutes with 90% coverage,&#8221; the actual answer falls in that interval 90% of the time, regardless of underlying distribution shape.</p>
<p>For federal decision support, this is non-negotiable. A point estimate without coverage is operationally meaningless.</p>
<h4>NIST AI RMF alignment</h4>
<p>The NIST AI Risk Management Framework articulates four functions: Map, Measure, Manage, Govern. Calibration and conformal prediction sit squarely in <strong>Measure</strong>. They are the operationally meaningful measurements of model trustworthiness — far more useful than the marketing accuracy a vendor leads with.</p>
<h4>What this implies for vendor evaluation</h4>
<p>Three concrete recommendations for federal AI procurement:</p>
<ol>
<li>Require ECE and reliability diagrams in every AI proposal.</li>
<li>Require a stated coverage method (preferably conformal) for any system that produces numerical estimates.</li>
<li>Require a monitoring plan for calibration drift, not just accuracy drift.</li>
</ol>
<p>A vendor that cannot answer those is not a credible vendor for high-stakes use.</p>
<h4>Closing</h4>
<p>Federal decision support is too consequential to run on headline accuracy. Calibration and conformal prediction are the right standards. Procurement should require them. Vendors should ship them. We do, and we think the field should follow.</p>
<hr>
<p>The post <a href="https://zorost.com/calibration-first-ai-federal/">Calibration-First AI for Federal Decision Support</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">24298</post-id>	</item>
		<item>
		<title>What We Open-Sourced This Year — and Why</title>
		<link>https://zorost.com/what-we-open-sourced-and-why/</link>
		
		<dc:creator><![CDATA[Zorost Intelligence]]></dc:creator>
		<pubDate>Tue, 10 Mar 2026 09:00:00 +0000</pubDate>
				<category><![CDATA[Open Source]]></category>
		<category><![CDATA[DevOps Monitor]]></category>
		<category><![CDATA[MarkForge]]></category>
		<category><![CDATA[Open Source Releases]]></category>
		<category><![CDATA[Sigma Axion]]></category>
		<category><![CDATA[Weaviate Local UI]]></category>
		<guid isPermaLink="false">https://zorost.com/what-we-open-sourced-and-why/</guid>

					<description><![CDATA[<p>Four open-source projects, four different reasons. A short manifesto on what we open-source, what we don't, and why.</p>
<p>The post <a href="https://zorost.com/what-we-open-sourced-and-why/">What We Open-Sourced This Year — and Why</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></description>
										<content:encoded><![CDATA[<blockquote>
<p><strong>Pull-quote:</strong> &#8220;We don&#8217;t open-source everything. We open-source the things that should belong to the community.&#8221;</p>
</blockquote>
<h4>Why this matters</h4>
<p>A lot of AI startups treat &#8220;open source&#8221; as a marketing posture. We treat it as a deliberate decision per project. Some projects belong in the commons because the community is better served by everyone using and improving them. Others stay proprietary because the R&amp;D investment is significant and the value flows back to customers through the product.</p>
<p>This year we open-sourced four projects.</p>
<h4>MarkForge</h4>
<p><strong>What it is.</strong> Bi-directional Markdown ↔ PDF / Office / HTML conversion. Built on Microsoft&#8217;s MarkItDown with extensions for PDF rendering, page sizes, and a WordPress plugin.</p>
<p><strong>Why open source.</strong> Document conversion is plumbing. Plumbing should be free. Every team — internal docs, technical writers, AI ingestion engineers — needs it, and there is no defensible business advantage in hoarding it.</p>
<p><strong>Status.</strong> Production. Used inside several of our platforms as an ingestion stage for RAG.</p>
<h4>Weaviate Local UI</h4>
<p><strong>What it is.</strong> A local desktop interface for the Weaviate vector database. Schema browsing, object inspection, vector search, RAG chat with multi-provider LLM support, document upload with chunking and embedding.</p>
<p><strong>Why open source.</strong> Vector databases are an active part of the agentic AI stack. Tooling that makes them accessible benefits the entire community. Weaviate is excellent and deserves a great local UX.</p>
<p><strong>Status.</strong> Production. Used inside our development workflow for any RAG system in early design.</p>
<h4>DevOps Monitor</h4>
<p><strong>What it is.</strong> A complete Docker-based monitoring stack: Grafana, Prometheus, Loki, Alertmanager, cAdvisor, Node Exporter. Configuration-driven target management. HTTP health checks. Per-application dashboards.</p>
<p><strong>Why open source.</strong> Every multi-service deployment needs this. Most teams either rebuild it from scratch (slow) or adopt a vendor SaaS (expensive and exfiltrating). The reference stack is a community good.</p>
<p><strong>Status.</strong> Production. Runs in front of every internal Zorost service.</p>
<h4>Sigma Axion (selected components)</h4>
<p><strong>What it is.</strong> Components of our quantitative research framework — indicator chains, walk-forward backtesting infrastructure, transaction-cost modeling — published under MIT.</p>
<p><strong>Why open source for these components.</strong> The plumbing of a quant stack should be a community good. The actual edges (the strategies themselves) are not open-sourced, because that is where the R&amp;D investment lives.</p>
<p><strong>Status.</strong> Production. Live at <a href="https://sigmaaxion.com">sigmaaxion.com</a>.</p>
<h4>What we don&#8217;t open-source</h4>
<p>We don&#8217;t open-source the platforms with significant proprietary R&amp;D investment: AeroFarr (causal AI for aviation), EvidAI (pharma evidence synthesis), FreightCortex (freight intelligence), Aquil (geopolitical intelligence), SPCio (co-developed with a manufacturing intelligence partner), or ComplyGrid. The investment is real and the value flows back to customers through the product.</p>
<h4>Closing</h4>
<p>Open source is a deliberate decision, not a posture. Some things belong to the community; some things belong to customers. We try to draw the line clearly.</p>
<hr>
<p>The post <a href="https://zorost.com/what-we-open-sourced-and-why/">What We Open-Sourced This Year — and Why</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">24297</post-id>	</item>
		<item>
		<title>Production ML on Databricks: Mlflow, Feature Store, Calibration</title>
		<link>https://zorost.com/production-ml-databricks-mlflow-feature-store-calibration/</link>
		
		<dc:creator><![CDATA[Zorost Intelligence]]></dc:creator>
		<pubDate>Tue, 03 Mar 2026 09:00:00 +0000</pubDate>
				<category><![CDATA[Databricks Modernization]]></category>
		<category><![CDATA[Calibration]]></category>
		<category><![CDATA[Feature Store]]></category>
		<category><![CDATA[MLflow]]></category>
		<category><![CDATA[MLOps]]></category>
		<category><![CDATA[Mosaic AI]]></category>
		<guid isPermaLink="false">https://zorost.com/production-ml-databricks-mlflow-feature-store-calibration/</guid>

					<description><![CDATA[<p>A reference MLOps stack on Databricks — MLflow Model Registry, Feature Store with online serving, calibration-first model evaluation, and Mosaic AI Model Serving.</p>
<p>The post <a href="https://zorost.com/production-ml-databricks-mlflow-feature-store-calibration/">Production ML on Databricks: Mlflow, Feature Store, Calibration</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></description>
										<content:encoded><![CDATA[<blockquote>
<p><strong>Pull-quote:</strong> &#8220;Production ML is not training a model. It&#8217;s the disciplines around training, registering, serving, monitoring, retraining, and retiring.&#8221;</p>
</blockquote>
<h4>Why this matters</h4>
<p>Most teams shipping their first ML model on Databricks underestimate the discipline required. Training is the small part. The system around training is the large part.</p>
<h4>The reference stack</h4>
<pre><code>   Data ──►  Feature Store  ◄────  online + offline serving
                  │
                  ▼
   Training pipeline (Databricks Job)
                  │
                  ▼
   MLflow Model Registry  ◄────  versions, stages, approvals
                  │
                  ▼
   Mosaic AI Model Serving  ◄────  A/B + canary
                  │
                  ▼
   Monitoring (drift, calibration, performance)
                  │
                  ▼
   Retraining trigger (event, schedule, drift threshold)</code></pre>
<h4>Feature Store — point-in-time correctness</h4>
<p>The Feature Store enforces <strong>point-in-time correctness</strong>: training features are joined as they were at the historical point in time the label was generated. This eliminates leakage that destroys offline evaluation reliability. Online serving uses the same feature definitions to keep training and serving consistent.</p>
<h4>MLflow Model Registry — lifecycle stages</h4>
<p>Models progress through stages with explicit gates:</p>
<table>
<thead>
<tr>
<th>Stage</th>
<th>Gate</th>
</tr>
</thead>
<tbody>
<tr>
<td>Staging</td>
<td>Passes regression suite + calibration checks</td>
</tr>
<tr>
<td>Production</td>
<td>Passes A/B + canary criteria</td>
</tr>
<tr>
<td>Archived</td>
<td>Replaced by a newer Production model</td>
</tr>
</tbody>
</table>
<p>Every stage transition is logged with the user, the reason, and the metrics that justified it.</p>
<h4>Calibration-first evaluation</h4>
<p>We require every model to ship with <strong>Expected Calibration Error (ECE)</strong> and <strong>conformal prediction</strong> intervals (LACP). Headline accuracy is reported but is not the gate.</p>
<table>
<thead>
<tr>
<th>Gate</th>
<th>Default threshold</th>
</tr>
</thead>
<tbody>
<tr>
<td>ECE</td>
<td>&lt; 0.02 on holdout</td>
</tr>
<tr>
<td>Reliability diagram</td>
<td>No bin &gt; 0.05 deviation</td>
</tr>
<tr>
<td>Conformal coverage</td>
<td>Within 2pp of stated coverage</td>
</tr>
<tr>
<td>Performance regression</td>
<td>No metric below the prior production model</td>
</tr>
</tbody>
</table>
<h4>Mosaic AI Model Serving — A/B and canary</h4>
<p>Traffic splits and canary rollouts are first-class. New versions get 5% of traffic, observed for SLAs and metrics, then ramp. Rollback is one click.</p>
<h4>Monitoring — drift, calibration, performance</h4>
<p>Three things to monitor:</p>
<ul>
<li><strong>Feature drift</strong> — input distribution shift</li>
<li><strong>Calibration drift</strong> — ECE moving</li>
<li><strong>Performance drift</strong> — labeled outcomes degrading</li>
</ul>
<p>Monitoring runs as a Databricks Job. Alerts go to Slack / Teams / PagerDuty.</p>
<h4>Closing</h4>
<p>Production ML on Databricks is straightforward when the stack is right: Feature Store for consistency, MLflow Registry for lifecycle, Mosaic AI Model Serving for delivery, calibration-first evaluation, and disciplined monitoring. The training is the easy part.</p>
<hr>
<p>The post <a href="https://zorost.com/production-ml-databricks-mlflow-feature-store-calibration/">Production ML on Databricks: Mlflow, Feature Store, Calibration</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">24307</post-id>	</item>
		<item>
		<title>Building Multi-Agent Workflows on Databricks (mosaic AI Agent Framework)</title>
		<link>https://zorost.com/multi-agent-databricks-mosaic-ai-agent-framework/</link>
		
		<dc:creator><![CDATA[Zorost Intelligence]]></dc:creator>
		<pubDate>Tue, 24 Feb 2026 09:00:00 +0000</pubDate>
				<category><![CDATA[Databricks Modernization]]></category>
		<category><![CDATA[Agent Framework]]></category>
		<category><![CDATA[Agentic AI]]></category>
		<category><![CDATA[MLflow]]></category>
		<category><![CDATA[Mosaic AI]]></category>
		<category><![CDATA[Multi-Agent]]></category>
		<guid isPermaLink="false">https://zorost.com/multi-agent-databricks-mosaic-ai-agent-framework/</guid>

					<description><![CDATA[<p>Multi-agent workflows native to the Lakehouse — designed, built, evaluated, and deployed on the Mosaic AI Agent Framework with typed tools and an evaluation harness.</p>
<p>The post <a href="https://zorost.com/multi-agent-databricks-mosaic-ai-agent-framework/">Building Multi-Agent Workflows on Databricks (mosaic AI Agent Framework)</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></description>
										<content:encoded><![CDATA[<blockquote>
<p><strong>Pull-quote:</strong> &#8220;Agents on the Lakehouse mean tools that read and write Delta tables, models that serve under MLflow, and evaluations that ship as Delta tables themselves.&#8221;</p>
</blockquote>
<h4>Why this matters</h4>
<p>Agentic workflows are the next layer on the Lakehouse — agents that reason, plan, call tools, and produce verifiable artifacts. The Mosaic AI Agent Framework provides the runtime. The architectural decisions still belong to you.</p>
<h4>Reference architecture</h4>
<pre><code>┌──────────────────────────────────────────────────────────────────┐
│                    AGENT (LangGraph / LlamaIndex / Custom)        │
│                                                                    │
│   Planner ──► Executor ──► Critic ──► Referee                    │
└─────────────────────┬────────────────────────────────────────────┘
                      │
                      ▼
       ┌──────────────────────────────┐
       │   Typed Tools                 │ ◄── Tool catalog
       │   - read Delta tables         │     (Unity Catalog)
       │   - write Delta tables        │
       │   - call MLflow models        │
       │   - call REST APIs            │
       └──────────────┬───────────────┘
                      │
                      ▼
       ┌──────────────────────────────┐
       │   Mosaic AI Model Serving     │
       │   - foundation models         │
       │   - fine-tuned models         │
       │   - per-agent traffic split   │
       └──────────────┬───────────────┘
                      │
                      ▼
       ┌──────────────────────────────┐
       │   Evaluations as Delta tables │ ◄── Versioned
       │   - golden datasets           │
       │   - regression suite          │
       │   - hallucination detection   │
       └──────────────────────────────┘</code></pre>
<h4>What &#8220;typed tools&#8221; means</h4>
<p>Every tool has a JSON schema for inputs and outputs. The agent cannot call a tool with invalid inputs — the schema rejects the call. This eliminates an entire class of failure that plagues unconstrained agents.</p>
<h4>What &#8220;evaluations as Delta tables&#8221; means</h4>
<p>Evaluation results are stored as rows in versioned Delta tables. Each row is <code>(agent_version, input, expected_output, actual_output, score, metadata)</code>. Regression analysis is a <code>JOIN</code> between two <code>agent_version</code> slices. New versions don&#8217;t promote unless they pass.</p>
<h4>The agent / human contract</h4>
<p>Where humans fit:</p>
<ul>
<li><strong>High-risk operations</strong> require human-in-the-loop checkpoints. Agents can propose; humans approve.</li>
<li><strong>Critic disagreements with the executor</strong> route to humans when the referee cannot adjudicate.</li>
<li><strong>Periodic spot-checks</strong> on agent decisions are scheduled into the evaluation harness.</li>
</ul>
<p>This is not &#8220;manual override.&#8221; This is a designed-in contract about which decisions are agent-final and which are human-final.</p>
<h4>Common architectural decisions</h4>
<table>
<thead>
<tr>
<th>Decision</th>
<th>Default</th>
</tr>
</thead>
<tbody>
<tr>
<td>Number of executors</td>
<td>One unless sub-goals are independent</td>
</tr>
<tr>
<td>Critic per executor or shared</td>
<td>Shared unless executors are heterogeneous</td>
</tr>
<tr>
<td>Memory model</td>
<td>Working memory in agent state; long-term memory in Delta table</td>
</tr>
<tr>
<td>Tool call timeout</td>
<td>30 s default, with retries on idempotent tools</td>
</tr>
<tr>
<td>Cost ceiling per session</td>
<td>Configurable; defaults to a hard cap</td>
</tr>
</tbody>
</table>
<h4>Closing</h4>
<p>Multi-agent workflows on Databricks are productive when the framework is paired with discipline: typed tools, deterministic logging, evaluations as Delta tables, and a designed-in agent / human contract. The Mosaic AI Agent Framework is the runtime; the architecture is yours.</p>
<hr>
<p>The post <a href="https://zorost.com/multi-agent-databricks-mosaic-ai-agent-framework/">Building Multi-Agent Workflows on Databricks (mosaic AI Agent Framework)</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">24306</post-id>	</item>
		<item>
		<title>When Agents Call Agents: Why the MCP Server Matters in Freight</title>
		<link>https://zorost.com/mcp-server-freight-agents-call-agents/</link>
		
		<dc:creator><![CDATA[Zorost Intelligence]]></dc:creator>
		<pubDate>Tue, 24 Feb 2026 09:00:00 +0000</pubDate>
				<category><![CDATA[Freight & Logistics]]></category>
		<category><![CDATA[Agentic AI]]></category>
		<category><![CDATA[FreightCortex]]></category>
		<category><![CDATA[MCP]]></category>
		<category><![CDATA[Model Context Protocol]]></category>
		<category><![CDATA[Multi-Agent]]></category>
		<guid isPermaLink="false">https://zorost.com/mcp-server-freight-agents-call-agents/</guid>

					<description><![CDATA[<p>Model Context Protocol lets external AI agents call FreightCortex tools natively. Here is why that matters — and what it unlocks for the freight intelligence stack.</p>
<p>The post <a href="https://zorost.com/mcp-server-freight-agents-call-agents/">When Agents Call Agents: Why the MCP Server Matters in Freight</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></description>
										<content:encoded><![CDATA[<blockquote>
<p><strong>Pull-quote:</strong> &#8220;If your platform isn&#8217;t callable by other agents, your platform isn&#8217;t future-proof.&#8221;</p>
</blockquote>
<h4>Why this matters</h4>
<p>The next generation of enterprise software is being shaped by a simple fact: <strong>users have agents now</strong>. Claude Desktop, custom internal agents, vendor-provided agents — they&#8217;re all going to call your platform. Either they call it through your REST API (and the agent has to know your URL structure, your authentication, your error semantics) or they call it through a standard protocol.</p>
<p>That standard is <strong>Model Context Protocol (MCP)</strong>.</p>
<h4>What MCP is</h4>
<p>MCP is an open protocol developed by Anthropic and adopted across the agent ecosystem. It defines how an AI agent describes its tools, how a host (the agent&#8217;s runtime) discovers and calls those tools, and how results are returned. The result is a clean separation: tools are <em>advertised</em>, agents <em>discover and call them</em>, and you can swap tool servers without touching the agent.</p>
<p>For FreightCortex, the MCP server is a thin layer that exposes our 16 tools using the protocol. An external agent — a customer&#8217;s internal Claude Desktop, an OEM&#8217;s analytics chatbot, or a third-party tool — can connect to our MCP endpoint and <em>use FreightCortex like a native tool</em>.</p>
<h4>What this unlocks</h4>
<p>Three things:</p>
<ol>
<li><strong>Native callability from any MCP-compatible agent.</strong> Customers do not need to write custom integrations. Their agent just connects to our MCP server.</li>
<li><strong>Composability with other tools.</strong> A customer agent can use FreightCortex tools alongside their own internal tools. The agent decides when to call which.</li>
<li><strong>Future-proofing.</strong> As the agent ecosystem grows, MCP-compatible platforms are accessible by default. REST-only platforms have to be manually integrated, one customer at a time.</li>
</ol>
<h4>What it requires</h4>
<p>Three engineering investments:</p>
<ol>
<li><strong>Tool contracts</strong> — every tool we want to expose has a typed schema. (We already had this.)</li>
<li><strong>The MCP server itself</strong> — a thin transport layer over those tools.</li>
<li><strong>Authentication and rate limiting</strong> — MCP doesn&#8217;t replace your existing auth; it sits on top of it.</li>
</ol>
<h4>A concrete example</h4>
<p>An analyst is using Claude Desktop on her workstation. She asks &#8220;what&#8217;s driving the cost increase on the Atlanta–Dallas corridor?&#8221; Claude knows about the FreightCortex MCP server (configured once per workstation) and decides to use it. It calls <code>query_corridor_metrics</code>, <code>compute_anomaly_score</code>, <code>query_carrier_metrics</code>, and <code>run_capacity_simulation</code> — and produces an answer with the same structure as the answer it would have given inside the FreightCortex web app, except this time it is in her existing analyst environment.</p>
<p>The customer never had to log in to FreightCortex.</p>
<h4>Closing</h4>
<p>If your platform isn&#8217;t callable by other agents, your platform isn&#8217;t future-proof. MCP is how you make that callable. It is a small engineering investment with very high leverage.</p>
<hr>
<p>The post <a href="https://zorost.com/mcp-server-freight-agents-call-agents/">When Agents Call Agents: Why the MCP Server Matters in Freight</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">24291</post-id>	</item>
		<item>
		<title>Hybrid Retrieval: Why Vector Alone Isn&#8217;t Enough</title>
		<link>https://zorost.com/hybrid-retrieval-vector-alone-not-enough/</link>
		
		<dc:creator><![CDATA[Zorost Intelligence]]></dc:creator>
		<pubDate>Tue, 17 Feb 2026 09:00:00 +0000</pubDate>
				<category><![CDATA[Agentic AI Engineering]]></category>
		<category><![CDATA[BM25]]></category>
		<category><![CDATA[Evaluation]]></category>
		<category><![CDATA[Hybrid Retrieval]]></category>
		<category><![CDATA[RAG]]></category>
		<category><![CDATA[Vector Search]]></category>
		<guid isPermaLink="false">https://zorost.com/hybrid-retrieval-vector-alone-not-enough/</guid>

					<description><![CDATA[<p>Vector search is excellent at semantic similarity and bad at named entities. BM25 is the opposite. Production-grade retrieval is hybrid — and the architecture decisions matter.</p>
<p>The post <a href="https://zorost.com/hybrid-retrieval-vector-alone-not-enough/">Hybrid Retrieval: Why Vector Alone Isn&#8217;t Enough</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></description>
										<content:encoded><![CDATA[<blockquote>
<p><strong>Pull-quote:</strong> &#8220;Pure vector retrieval is the most common production-grade RAG mistake. Pure BM25 is the second most common.&#8221;</p>
</blockquote>
<h4>Why this matters</h4>
<p>A pattern repeats in every RAG project that goes wrong: someone embeds the corpus, runs vector search, and ships. The system works in demos and disappoints in production. The fix is a structural architecture change: <strong>hybrid retrieval</strong>.</p>
<h4>The components</h4>
<pre><code>Query
  │
  ├──► Dense (vector)   — pgvector / Weaviate / Qdrant + an embedding model
  │
  ├──► Sparse (BM25)    — Postgres FTS / Elasticsearch / OpenSearch
  │
  ├──► Optional filters — date range, source, entity tags
  │
  └──► Merge (RRF or weighted) ──► Cross-encoder re-rank ──► Top-K
                                                                │
                                                                ▼
                                                Citation-grounded generation</code></pre>
<h4>Why each piece matters</h4>
<ul>
<li><strong>Vector</strong> is excellent at <em>semantic similarity</em> — finding documents that are about the same topic in different words. It is bad at <em>named entities</em> — exact terms, IDs, dates.</li>
<li><strong>BM25</strong> is the opposite — excellent at named entities, weaker on semantic similarity.</li>
<li><strong>Filters</strong> — when the question is bounded (&#8220;just look at 2024 reports about Boeing 737&#8221;), filters dramatically reduce the candidate set before ranking.</li>
<li><strong>Merge</strong> — Reciprocal Rank Fusion (RRF) is a clean default. Weighted merges work with calibrated scores.</li>
<li><strong>Cross-encoder re-rank</strong> — sees the query and the candidate document together and scores them jointly. More expensive than bi-encoder vector search, but the precision improvement on the top-K is large enough to pay for itself.</li>
</ul>
<h4>What changes when you do this right</h4>
<ul>
<li>Hallucination rate drops. The model has better evidence to ground in.</li>
<li>Citation precision goes up. The cited documents actually support the claim.</li>
<li>Edge cases (rare entity queries, exact-quote queries) work properly.</li>
<li>Generation latency stays low because the model only sees the top-K (typically 6–10), not the top-100.</li>
</ul>
<h4>Common mistakes</h4>
<ul>
<li><strong>No re-ranker.</strong> Top-50 from vector + top-50 from BM25 with RRF is a starting point, but without a re-ranker the top-K still contains noise.</li>
<li><strong>No filtering.</strong> Filtering before retrieval is essentially free if your data is properly indexed.</li>
<li><strong>Skip evaluation.</strong> Without a golden Q&amp;A dataset and grounding scoring, you have no way to compare retrieval architectures.</li>
</ul>
<h4>Closing</h4>
<p>Pure vector retrieval is the most common production-grade RAG mistake. Hybrid retrieval — vector + sparse + filters + re-rank — is the boring, reliable, production answer. Every Zorost RAG system runs this architecture.</p>
<hr>
<p>The post <a href="https://zorost.com/hybrid-retrieval-vector-alone-not-enough/">Hybrid Retrieval: Why Vector Alone Isn&#8217;t Enough</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">24296</post-id>	</item>
		<item>
		<title>Why Calibration Matters More Than Accuracy: an ECE 0.012 Story</title>
		<link>https://zorost.com/calibration-matters-more-than-accuracy/</link>
		
		<dc:creator><![CDATA[Zorost Intelligence]]></dc:creator>
		<pubDate>Tue, 10 Feb 2026 09:00:00 +0000</pubDate>
				<category><![CDATA[Aviation Intelligence]]></category>
		<category><![CDATA[AeroFarr]]></category>
		<category><![CDATA[Calibration]]></category>
		<category><![CDATA[Conformal Prediction]]></category>
		<category><![CDATA[ECE]]></category>
		<category><![CDATA[Evaluation]]></category>
		<category><![CDATA[LACP]]></category>
		<guid isPermaLink="false">https://zorost.com/calibration-matters-more-than-accuracy/</guid>

					<description><![CDATA[<p>Headline accuracy is a misleading metric for high-stakes decisions. Calibration is the real one. Here is what ECE 0.012 means and how we got there.</p>
<p>The post <a href="https://zorost.com/calibration-matters-more-than-accuracy/">Why Calibration Matters More Than Accuracy: an ECE 0.012 Story</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></description>
										<content:encoded><![CDATA[<blockquote>
<p><strong>Pull-quote:</strong> &#8220;When the model says 70%, it should be right 70% of the time. That&#8217;s calibration. Anything less is dishonest.&#8221;</p>
</blockquote>
<h4>Why this matters</h4>
<p>&#8220;Our model is 92% accurate&#8221; is a marketing line. It tells you almost nothing about whether you should trust the model with a decision. The real question is: <strong>when the model says it is 70% confident, is it actually right 70% of the time?</strong></p>
<p>That is <strong>calibration</strong>. The metric is <strong>Expected Calibration Error (ECE)</strong>.</p>
<h4>The metric, briefly</h4>
<p>Group predictions by their stated probability. For each bin, compare the average predicted probability to the actual observed frequency. The weighted average of the absolute differences is the ECE. Lower is better. Below 0.02 is excellent. Below 0.01 is very good in production.</p>
<p>AeroFarr&#8217;s gate classifier achieves <strong>ECE 0.012 on 581,316 held-out flights</strong>. That means the predicted probabilities track the actual observed frequencies very tightly across the full probability range — not just at the mean.</p>
<h4>How we got there</h4>
<p>Three ingredients:</p>
<ol>
<li><strong>A multi-head stacked architecture</strong> — separate heads for gate / severity / regression / quantile, each tuned on the loss most appropriate for its job, then combined under a non-linear meta-learner. The meta sees the heads&#8217; outputs and learns how to combine them. Calibration is enforced at each head and at the meta.</li>
<li><strong>Loss functions chosen for calibration, not accuracy.</strong> Cross-entropy with label smoothing for classifiers; quantile loss for the quantile heads.</li>
<li><strong>Post-hoc calibration on a holdout slice.</strong> Platt scaling and isotonic regression are applied as a final stage on a slice of data the heads never saw.</li>
</ol>
<p>Calibration has to be designed in from the start. Bolting it on at the end as a band-aid does not work for high-stakes operational use.</p>
<h4>Why it matters operationally</h4>
<p>If a planner is making a &#8220;should we keep this aircraft on the gate?&#8221; decision and the model says 30% chance of cancellation, the planner&#8217;s mental model is: <em>roughly one in three.</em> If the model is poorly calibrated and 30% is actually 60%, the planner&#8217;s prior is wrong, and every decision downstream is wrong.</p>
<p>Calibrated probabilities preserve the planner&#8217;s intuition. Uncalibrated probabilities corrupt it.</p>
<h4>Conformal prediction on top</h4>
<p>Calibration tells you about average behavior. <strong>Conformal prediction</strong> tells you about <em>individual</em> uncertainty. We use <strong>Locally Adaptive Conformal Prediction (LACP)</strong> to produce distribution-free prediction intervals — meaning when AeroFarr says &#8220;delay between 18 and 47 minutes with 90% coverage,&#8221; the actual delay falls in that interval 90% of the time, regardless of underlying distribution shape.</p>
<p>This is the second ingredient of honesty in a production model. Calibration says the model&#8217;s stated probabilities mean what they say. Conformal prediction says the model&#8217;s stated intervals mean what they say.</p>
<h4>Closing</h4>
<p>Headline accuracy is a misleading metric for high-stakes decisions. Calibration and conformal prediction are the real ones. ECE 0.012 is what we ship. We don&#8217;t quote accuracy without calibration, and we don&#8217;t quote intervals without coverage.</p>
<hr>
<p>The post <a href="https://zorost.com/calibration-matters-more-than-accuracy/">Why Calibration Matters More Than Accuracy: an ECE 0.012 Story</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">24285</post-id>	</item>
	</channel>
</rss>
