Mosaic AI Archives - Zorost Intelligence | AI, Cloud & Data Experts

Production ML on Databricks: Mlflow, Feature Store, Calibration

Zorost Intelligence — Tue, 03 Mar 2026 09:00:00 +0000

Pull-quote: “Production ML is not training a model. It’s the disciplines around training, registering, serving, monitoring, retraining, and retiring.”

Why this matters

Most teams shipping their first ML model on Databricks underestimate the discipline required. Training is the small part. The system around training is the large part.

The reference stack

   Data ──►  Feature Store  ◄────  online + offline serving
                  │
                  ▼
   Training pipeline (Databricks Job)
                  │
                  ▼
   MLflow Model Registry  ◄────  versions, stages, approvals
                  │
                  ▼
   Mosaic AI Model Serving  ◄────  A/B + canary
                  │
                  ▼
   Monitoring (drift, calibration, performance)
                  │
                  ▼
   Retraining trigger (event, schedule, drift threshold)

Feature Store — point-in-time correctness

The Feature Store enforces point-in-time correctness: training features are joined as they were at the historical point in time the label was generated. This eliminates leakage that destroys offline evaluation reliability. Online serving uses the same feature definitions to keep training and serving consistent.

MLflow Model Registry — lifecycle stages

Models progress through stages with explicit gates:

Stage	Gate
Staging	Passes regression suite + calibration checks
Production	Passes A/B + canary criteria
Archived	Replaced by a newer Production model

Every stage transition is logged with the user, the reason, and the metrics that justified it.

Calibration-first evaluation

We require every model to ship with Expected Calibration Error (ECE) and conformal prediction intervals (LACP). Headline accuracy is reported but is not the gate.

Gate	Default threshold
ECE	< 0.02 on holdout
Reliability diagram	No bin > 0.05 deviation
Conformal coverage	Within 2pp of stated coverage
Performance regression	No metric below the prior production model

Mosaic AI Model Serving — A/B and canary

Traffic splits and canary rollouts are first-class. New versions get 5% of traffic, observed for SLAs and metrics, then ramp. Rollback is one click.

Monitoring — drift, calibration, performance

Three things to monitor:

Feature drift — input distribution shift
Calibration drift — ECE moving
Performance drift — labeled outcomes degrading

Monitoring runs as a Databricks Job. Alerts go to Slack / Teams / PagerDuty.

Closing

Production ML on Databricks is straightforward when the stack is right: Feature Store for consistency, MLflow Registry for lifecycle, Mosaic AI Model Serving for delivery, calibration-first evaluation, and disciplined monitoring. The training is the easy part.

The post Production ML on Databricks: Mlflow, Feature Store, Calibration appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

Building Multi-Agent Workflows on Databricks (mosaic AI Agent Framework)

Zorost Intelligence — Tue, 24 Feb 2026 09:00:00 +0000

Pull-quote: “Agents on the Lakehouse mean tools that read and write Delta tables, models that serve under MLflow, and evaluations that ship as Delta tables themselves.”

Why this matters

Agentic workflows are the next layer on the Lakehouse — agents that reason, plan, call tools, and produce verifiable artifacts. The Mosaic AI Agent Framework provides the runtime. The architectural decisions still belong to you.

Reference architecture

┌──────────────────────────────────────────────────────────────────┐
│                    AGENT (LangGraph / LlamaIndex / Custom)        │
│                                                                    │
│   Planner ──► Executor ──► Critic ──► Referee                    │
└─────────────────────┬────────────────────────────────────────────┘
                      │
                      ▼
       ┌──────────────────────────────┐
       │   Typed Tools                 │ ◄── Tool catalog
       │   - read Delta tables         │     (Unity Catalog)
       │   - write Delta tables        │
       │   - call MLflow models        │
       │   - call REST APIs            │
       └──────────────┬───────────────┘
                      │
                      ▼
       ┌──────────────────────────────┐
       │   Mosaic AI Model Serving     │
       │   - foundation models         │
       │   - fine-tuned models         │
       │   - per-agent traffic split   │
       └──────────────┬───────────────┘
                      │
                      ▼
       ┌──────────────────────────────┐
       │   Evaluations as Delta tables │ ◄── Versioned
       │   - golden datasets           │
       │   - regression suite          │
       │   - hallucination detection   │
       └──────────────────────────────┘

What “typed tools” means

Every tool has a JSON schema for inputs and outputs. The agent cannot call a tool with invalid inputs — the schema rejects the call. This eliminates an entire class of failure that plagues unconstrained agents.

What “evaluations as Delta tables” means

Evaluation results are stored as rows in versioned Delta tables. Each row is (agent_version, input, expected_output, actual_output, score, metadata). Regression analysis is a JOIN between two agent_version slices. New versions don’t promote unless they pass.

The agent / human contract

Where humans fit:

High-risk operations require human-in-the-loop checkpoints. Agents can propose; humans approve.
Critic disagreements with the executor route to humans when the referee cannot adjudicate.
Periodic spot-checks on agent decisions are scheduled into the evaluation harness.

This is not “manual override.” This is a designed-in contract about which decisions are agent-final and which are human-final.

Common architectural decisions

Decision	Default
Number of executors	One unless sub-goals are independent
Critic per executor or shared	Shared unless executors are heterogeneous
Memory model	Working memory in agent state; long-term memory in Delta table
Tool call timeout	30 s default, with retries on idempotent tools
Cost ceiling per session	Configurable; defaults to a hard cap

Closing

Multi-agent workflows on Databricks are productive when the framework is paired with discipline: typed tools, deterministic logging, evaluations as Delta tables, and a designed-in agent / human contract. The Mosaic AI Agent Framework is the runtime; the architecture is yours.

The post Building Multi-Agent Workflows on Databricks (mosaic AI Agent Framework) appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

Production-Grade RAG on the Lakehouse with Mosaic AI Vector Search

Zorost Intelligence — Tue, 03 Feb 2026 09:00:00 +0000

Pull-quote: “RAG works in demos. RAG that works in production requires hybrid retrieval, a re-ranker, citation grounding, and an evaluation harness.”

Why this matters

Most RAG projects pilot well and disappoint in production. The pattern is the same: embed the corpus, run vector search, ship. Production-grade RAG requires more.

The production RAG architecture

                     ┌────────────────────┐
        Question ───►│  AI Gateway        │  ← key mgmt, routing, observability
                     └──────────┬─────────┘
                                ▼
        ┌────────────────────────────────────────────┐
        │                Retrieval                    │
        │  ┌────────────────┐  ┌────────────────┐   │
        │  │ Mosaic AI      │  │ BM25 (lexical) │   │
        │  │ Vector Search  │  │ on Delta SQL   │   │
        │  │ (Delta-synced) │  │                │   │
        │  └───────┬────────┘  └────────┬───────┘   │
        │          └──── merge (RRF) ───┘           │
        │                  │                          │
        │              cross-encoder                  │
        │              re-rank                        │
        └────────────────┬─────────────────────────────┘
                         ▼
              top-K (typically 6–10)
                         │
                         ▼
              Citation-grounded generation
              (Mosaic AI Model Serving)
                         │
                         ▼
              Validated answer with source links

Why Mosaic AI Vector Search specifically

Mosaic AI Vector Search synchronizes with Delta tables. Update the source table, the index updates. No orchestration glue. Tagging, ACLs, and lineage flow through Unity Catalog. For RAG over enterprise data that changes, this matters more than people initially appreciate.

Hybrid retrieval is the pattern

Pure vector search is the most common production RAG mistake. Pure BM25 is the second most common. Hybrid — vector + BM25 + filters + re-rank — is the answer that actually works.

Citation grounding as a structural fix

Constrain the model to write with bracketed citation tokens. Validate every citation against the retrieval set. Reject answers that fail validation. This is a small structural change with a large operational impact.

Evaluation harness — non-negotiable

A production RAG system without an evaluation harness is a guess. The harness has three components:

Golden Q&A dataset — questions paired with the documents that should ground the answers
Grounding rate — what fraction of generated claims are supported by retrieved documents
Hallucination detection — flagging unsupported claims

The harness runs as a Databricks Job on every model or retrieval change. Regressions are caught before deployment.

Closing

Production RAG on the Lakehouse with Mosaic AI is straightforward when you adopt the architecture: hybrid retrieval, re-ranker, citation grounding, evaluation harness. The result is a RAG system analysts trust enough to use.

The post Production-Grade RAG on the Lakehouse with Mosaic AI Vector Search appeared first on Zorost Intelligence | AI, Cloud & Data Experts.