Databricks Modernization Archives - Zorost Intelligence | AI, Cloud & Data Experts

Databricks Cost Optimization & Finops: Where the Real Savings Are

Zorost Intelligence — Tue, 21 Apr 2026 09:00:00 +0000

Pull-quote: “Cost optimization is not a one-time project. It’s a recurring discipline. The tooling is there. The discipline is the ask.”

Why this matters

Most Databricks deployments have 30–60% slack in their spend within twelve months of go-live. Some of it is unavoidable (early-stage discovery). Some of it is technical (file layout, cluster sizing). Most of it is organizational (no cost ownership, no tagging, no review cadence).

Where the real savings are

Lever	Typical impact
Right-sized cluster types (Photon, autoscaling, spot)	15–30%
Job orchestration (concurrent runs, dependencies, retries)	5–15%
File compaction (`OPTIMIZE`, `Z-ORDER`, `liquid clustering`)	10–25% on read-heavy workloads
Caching strategies (Delta cache, query cache)	5–15%
Workload migration to Serverless SQL where appropriate	10–25%
BI semantic-model rationalization	10–20% on Power BI / Tableau queries
Autoscaling thresholds	5–10%
Tombstone management (`VACUUM`)	Cleanup, not a direct saving, but sustainable

Ranges are typical for engagements where the team has not previously focused on cost. Mature deployments have less to find.

Tagging and ownership — the prerequisite

Without tagging, you can’t optimize. Required tags:

cost_center
environment (dev / stage / prod)
owner (team or person)
workload (training / serving / ETL / BI / ad-hoc)

These flow into the system tables for cost reporting (system.billing.usage).

The audit, in twelve hours

A typical audit takes about twelve hours of senior engineering time:

Pull system.billing.usage for the last 90 days, joined with cluster metadata
Identify the top 10 jobs by cost
For each, evaluate: is the cluster the right type? Is autoscaling tuned? Are files compacted? Is the workload running at the right cadence?
Identify candidates for serverless migration
Identify candidates for materialized view replacement
Produce a prioritized list with estimated savings

Most teams find five to ten actions that together deliver 20–40% savings.

Common findings

A nightly batch job using a high-end cluster size when a Photon-enabled smaller cluster would do
A streaming pipeline running with a cluster sized for peak when traffic is bimodal
A Power BI model importing 80% of data that nobody queries
A SELECT * materialized in a downstream view, doubling storage cost on a hot dataset
An ad-hoc cluster left running over a weekend

Cost ownership cadence

The discipline that holds savings: monthly cost review with the data leadership and the FinOps lead. Each owner explains anomalies. Tags get fixed. Wasteful patterns get retired.

Closing

Cost optimization on Databricks is not a one-time project. It is a recurring discipline backed by tagging, system tables, and a monthly review. The platform tooling is there. The discipline is the ask.

The post Databricks Cost Optimization & Finops: Where the Real Savings Are appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

Power Bi Direct Lake on Databricks SQL: a Modernization Playbook

Zorost Intelligence — Tue, 31 Mar 2026 09:00:00 +0000

Pull-quote: “Direct Lake is not faster DirectQuery. It is a different mode that eliminates a class of refreshes that should never have existed.”

Why this matters

Power BI has been deployed in three modes for a decade: Import, DirectQuery, and Composite. Each has trade-offs. Import is fast but stale; DirectQuery is fresh but slow; Composite is a compromise. Direct Lake — Power BI talking directly to Delta tables in Databricks SQL — is a fourth mode that eliminates a class of refresh problems that should never have existed.

The four modes

Mode	Freshness	Performance	When to use
Import	Stale until next refresh	Fast	Small models, infrequent updates
DirectQuery	Live	Slow on large fact tables	Real-time-ish dashboards over modest volume
Composite	Mixed	Mixed	Hybrid scenarios
Direct Lake	Live (on Delta)	Fast	Lakehouse-native consumption

Why Direct Lake works

Direct Lake reads Delta files directly into Power BI’s analytics engine without import. There is no refresh schedule. There is no DirectQuery overhead. The semantic model points at Unity Catalog tables and the engine handles the rest.

The conditions for it to work:

Source data must be in Delta format
Tables must be in Unity Catalog
Model size must fit in the engine’s memory budget for the SKU
DAX must be Direct Lake-compatible (most is; some isn’t)

Migration playbook

Phase	Output
Discovery	Catalog of existing Power BI models · usage telemetry
Source landing in Delta	Sources moved to Delta tables in Unity Catalog
Semantic model rebuild	New model on Direct Lake
Visual rebuild	Reports and dashboards rebuilt against the new model
Parallel run	Old and new models in production simultaneously
Cutover	Old retired

Governance benefits

Row and column security live in the dynamic views in Unity Catalog, not in the semantic model. One source of truth for security.
Lineage covers the entire path from source through Delta to Power BI.
Performance tuning happens at the Delta layer (liquid clustering, OPTIMIZE, Z-order) and benefits every consumer, not just Power BI.

Closing

Direct Lake is the modern Power BI mode for Lakehouse-native consumption. The migration is methodical, the trade-offs are clear, and the result is faster, fresher dashboards with simpler operations.

The post Power Bi Direct Lake on Databricks SQL: a Modernization Playbook appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

Production ML on Databricks: Mlflow, Feature Store, Calibration

Zorost Intelligence — Tue, 03 Mar 2026 09:00:00 +0000

Pull-quote: “Production ML is not training a model. It’s the disciplines around training, registering, serving, monitoring, retraining, and retiring.”

Why this matters

Most teams shipping their first ML model on Databricks underestimate the discipline required. Training is the small part. The system around training is the large part.

The reference stack

   Data ──►  Feature Store  ◄────  online + offline serving
                  │
                  ▼
   Training pipeline (Databricks Job)
                  │
                  ▼
   MLflow Model Registry  ◄────  versions, stages, approvals
                  │
                  ▼
   Mosaic AI Model Serving  ◄────  A/B + canary
                  │
                  ▼
   Monitoring (drift, calibration, performance)
                  │
                  ▼
   Retraining trigger (event, schedule, drift threshold)

Feature Store — point-in-time correctness

The Feature Store enforces point-in-time correctness: training features are joined as they were at the historical point in time the label was generated. This eliminates leakage that destroys offline evaluation reliability. Online serving uses the same feature definitions to keep training and serving consistent.

MLflow Model Registry — lifecycle stages

Models progress through stages with explicit gates:

Stage	Gate
Staging	Passes regression suite + calibration checks
Production	Passes A/B + canary criteria
Archived	Replaced by a newer Production model

Every stage transition is logged with the user, the reason, and the metrics that justified it.

Calibration-first evaluation

We require every model to ship with Expected Calibration Error (ECE) and conformal prediction intervals (LACP). Headline accuracy is reported but is not the gate.

Gate	Default threshold
ECE	< 0.02 on holdout
Reliability diagram	No bin > 0.05 deviation
Conformal coverage	Within 2pp of stated coverage
Performance regression	No metric below the prior production model

Mosaic AI Model Serving — A/B and canary

Traffic splits and canary rollouts are first-class. New versions get 5% of traffic, observed for SLAs and metrics, then ramp. Rollback is one click.

Monitoring — drift, calibration, performance

Three things to monitor:

Feature drift — input distribution shift
Calibration drift — ECE moving
Performance drift — labeled outcomes degrading

Monitoring runs as a Databricks Job. Alerts go to Slack / Teams / PagerDuty.

Closing

Production ML on Databricks is straightforward when the stack is right: Feature Store for consistency, MLflow Registry for lifecycle, Mosaic AI Model Serving for delivery, calibration-first evaluation, and disciplined monitoring. The training is the easy part.

The post Production ML on Databricks: Mlflow, Feature Store, Calibration appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

Building Multi-Agent Workflows on Databricks (mosaic AI Agent Framework)

Zorost Intelligence — Tue, 24 Feb 2026 09:00:00 +0000

Pull-quote: “Agents on the Lakehouse mean tools that read and write Delta tables, models that serve under MLflow, and evaluations that ship as Delta tables themselves.”

Why this matters

Agentic workflows are the next layer on the Lakehouse — agents that reason, plan, call tools, and produce verifiable artifacts. The Mosaic AI Agent Framework provides the runtime. The architectural decisions still belong to you.

Reference architecture

┌──────────────────────────────────────────────────────────────────┐
│                    AGENT (LangGraph / LlamaIndex / Custom)        │
│                                                                    │
│   Planner ──► Executor ──► Critic ──► Referee                    │
└─────────────────────┬────────────────────────────────────────────┘
                      │
                      ▼
       ┌──────────────────────────────┐
       │   Typed Tools                 │ ◄── Tool catalog
       │   - read Delta tables         │     (Unity Catalog)
       │   - write Delta tables        │
       │   - call MLflow models        │
       │   - call REST APIs            │
       └──────────────┬───────────────┘
                      │
                      ▼
       ┌──────────────────────────────┐
       │   Mosaic AI Model Serving     │
       │   - foundation models         │
       │   - fine-tuned models         │
       │   - per-agent traffic split   │
       └──────────────┬───────────────┘
                      │
                      ▼
       ┌──────────────────────────────┐
       │   Evaluations as Delta tables │ ◄── Versioned
       │   - golden datasets           │
       │   - regression suite          │
       │   - hallucination detection   │
       └──────────────────────────────┘

What “typed tools” means

Every tool has a JSON schema for inputs and outputs. The agent cannot call a tool with invalid inputs — the schema rejects the call. This eliminates an entire class of failure that plagues unconstrained agents.

What “evaluations as Delta tables” means

Evaluation results are stored as rows in versioned Delta tables. Each row is (agent_version, input, expected_output, actual_output, score, metadata). Regression analysis is a JOIN between two agent_version slices. New versions don’t promote unless they pass.

The agent / human contract

Where humans fit:

High-risk operations require human-in-the-loop checkpoints. Agents can propose; humans approve.
Critic disagreements with the executor route to humans when the referee cannot adjudicate.
Periodic spot-checks on agent decisions are scheduled into the evaluation harness.

This is not “manual override.” This is a designed-in contract about which decisions are agent-final and which are human-final.

Common architectural decisions

Decision	Default
Number of executors	One unless sub-goals are independent
Critic per executor or shared	Shared unless executors are heterogeneous
Memory model	Working memory in agent state; long-term memory in Delta table
Tool call timeout	30 s default, with retries on idempotent tools
Cost ceiling per session	Configurable; defaults to a hard cap

Closing

Multi-agent workflows on Databricks are productive when the framework is paired with discipline: typed tools, deterministic logging, evaluations as Delta tables, and a designed-in agent / human contract. The Mosaic AI Agent Framework is the runtime; the architecture is yours.

The post Building Multi-Agent Workflows on Databricks (mosaic AI Agent Framework) appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

Production-Grade RAG on the Lakehouse with Mosaic AI Vector Search

Zorost Intelligence — Tue, 03 Feb 2026 09:00:00 +0000

Pull-quote: “RAG works in demos. RAG that works in production requires hybrid retrieval, a re-ranker, citation grounding, and an evaluation harness.”

Why this matters

Most RAG projects pilot well and disappoint in production. The pattern is the same: embed the corpus, run vector search, ship. Production-grade RAG requires more.

The production RAG architecture

                     ┌────────────────────┐
        Question ───►│  AI Gateway        │  ← key mgmt, routing, observability
                     └──────────┬─────────┘
                                ▼
        ┌────────────────────────────────────────────┐
        │                Retrieval                    │
        │  ┌────────────────┐  ┌────────────────┐   │
        │  │ Mosaic AI      │  │ BM25 (lexical) │   │
        │  │ Vector Search  │  │ on Delta SQL   │   │
        │  │ (Delta-synced) │  │                │   │
        │  └───────┬────────┘  └────────┬───────┘   │
        │          └──── merge (RRF) ───┘           │
        │                  │                          │
        │              cross-encoder                  │
        │              re-rank                        │
        └────────────────┬─────────────────────────────┘
                         ▼
              top-K (typically 6–10)
                         │
                         ▼
              Citation-grounded generation
              (Mosaic AI Model Serving)
                         │
                         ▼
              Validated answer with source links

Why Mosaic AI Vector Search specifically

Mosaic AI Vector Search synchronizes with Delta tables. Update the source table, the index updates. No orchestration glue. Tagging, ACLs, and lineage flow through Unity Catalog. For RAG over enterprise data that changes, this matters more than people initially appreciate.

Hybrid retrieval is the pattern

Pure vector search is the most common production RAG mistake. Pure BM25 is the second most common. Hybrid — vector + BM25 + filters + re-rank — is the answer that actually works.

Citation grounding as a structural fix

Constrain the model to write with bracketed citation tokens. Validate every citation against the retrieval set. Reject answers that fail validation. This is a small structural change with a large operational impact.

Evaluation harness — non-negotiable

A production RAG system without an evaluation harness is a guess. The harness has three components:

Golden Q&A dataset — questions paired with the documents that should ground the answers
Grounding rate — what fraction of generated claims are supported by retrieved documents
Hallucination detection — flagging unsupported claims

The harness runs as a Databricks Job on every model or retrieval change. Regressions are caught before deployment.

Closing

Production RAG on the Lakehouse with Mosaic AI is straightforward when you adopt the architecture: hybrid retrieval, re-ranker, citation grounding, evaluation harness. The result is a RAG system analysts trust enough to use.

The post Production-Grade RAG on the Lakehouse with Mosaic AI Vector Search appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

Unity Catalog: Governance Done Right

Zorost Intelligence — Tue, 13 Jan 2026 09:00:00 +0000

Pull-quote: “Governance that the team can’t navigate is governance that the team will route around.”

Why this matters

Most data-governance projects fail because they start with policy. The good ones start with structure. Unity Catalog’s hierarchy (Catalog → Schema → Table) is the structural foundation that makes policy enforceable.

Reference layout (data-mesh)

catalog: zorost
├── domain_aviation
│   ├── flights_silver
│   ├── delays_gold
│   └── safety_rag
├── domain_manufacturing
│   ├── spc_silver
│   └── capability_gold
├── domain_freight
│   ├── corridors_silver
│   └── emissions_gold
├── domain_finance
│   └── ...
└── domain_governance       ← cross-cutting
    ├── audit_logs
    ├── pii_register
    └── data_quality_metrics

Permission model

Principal	What they get
Domain Steward	OWNER on `domain_X.*`
Domain Engineer	USAGE on parent catalog + USE_SCHEMA on `domain_X.` + CREATE on `domain_X.`
Cross-domain Analyst	SELECT on Gold tables only
Auditor	SELECT on `domain_governance.*`
Service Principal (apps)	SELECT on specific Gold tables · scoped by token

Row and column security with dynamic views

Unity Catalog supports dynamic views — views whose behavior depends on the current user. A typical pattern:

CREATE VIEW domain_aviation.flights_secure AS
SELECT
  flight_id,
  origin_airport,
  destination_airport,
  CASE WHEN is_member('phi_authorized') THEN passenger_count ELSE NULL END
    AS passenger_count,
  ...
FROM domain_aviation.flights_silver
WHERE
  CASE
    WHEN is_member('all_regions') THEN TRUE
    ELSE region IN (SELECT region FROM domain_governance.user_region_grants
                     WHERE user = current_user())
  END;

is_member(), current_user(), mask(), and filter() together cover row-level, column-level, and full-fledged ABAC patterns.

Tags and classification

Every column and table can carry tags. We standardize a tag taxonomy:

Tag	Values	Use
`pii_class`	`pii`, `pii_sensitive`, `phi`, `pci`, `none`	Drives masking and access policy
`data_owner`	domain steward email	Clear accountability
`freshness_sla`	`realtime`, `1h`, `1d`, `1w`	Drives monitoring
`retention`	`30d`, `1y`, `7y`, `permanent`	Drives lifecycle

Tags make policy queryable: “show me all PII-tagged columns in domain_finance” returns a row, not an email thread.

Lineage and audit

Unity Catalog captures column-level lineage across SQL, Python, ML, and BI consumption. Audit logs go to a sink the security team owns. Both are queryable via system.access.audit and system.lineage.column_lineage.

Closing

Governance done right starts with structure. Unity Catalog’s hierarchy + permission model + tagging + dynamic views + lineage + audit are the primitives. The implementation is workshop-driven, but the building blocks are stable and the patterns are reproducible.

The post Unity Catalog: Governance Done Right appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

Streaming on the Lakehouse: Auto Loader + DLT in Practice

Zorost Intelligence — Tue, 30 Dec 2025 09:00:00 +0000

Pull-quote: “Streaming pipelines that wake people at 3 AM are not real-time. They’re real-painful.”

Why this matters

Real-time pipelines are easy to demo and hard to operate. The pattern that fails: a clever Spark Structured Streaming job that works in dev, struggles in prod under skew, and breaks at the first schema evolution. The pattern that survives: Auto Loader for ingestion, DLT for transformations, expectations for quality, and SLOs that the team monitors like uptime.

The reference architecture

   Sources                Ingestion              Transformation          Consumption
   ───────                ─────────              ──────────────          ───────────
   Cloud storage  ──►  Auto Loader (cloudFiles) ──►  Bronze
   Kafka / EH     ──►  Structured Streaming    ──►  Bronze
   CDC (Debezium) ──►  Auto Loader / SS        ──►  Bronze
                                                        │
                                              DLT expectations
                                              (drop / quarantine)
                                                        ▼
                                                     Silver
                                                        │
                                              joins / aggregations
                                                        ▼
                                                      Gold ──►  BI · ML · Apps

Auto Loader: incremental, schema-evolving, exactly-once

Auto Loader is the foundation. For file-based ingestion at scale, it handles:

Incremental discovery of new files
Schema inference with versioned schema files
Schema evolution with rescued data column for unexpected fields
Exactly-once semantics via durable file tracking

For event streams, Structured Streaming directly from Kafka, Event Hubs, or Kinesis covers the same role.

DLT: declarative streaming with managed dependencies

DLT lets you describe what the pipeline computes, not how. The runtime handles dependency ordering, retry semantics, schema validation, and metric capture. Expectations express data-quality contracts:

-- Pseudocode
CREATE STREAMING LIVE TABLE silver_orders
  CONSTRAINT valid_id  EXPECT (order_id IS NOT NULL) ON VIOLATION DROP ROW
  CONSTRAINT valid_amt EXPECT (amount > 0)           ON VIOLATION DROP ROW
  CONSTRAINT plausible EXPECT (amount < 1e7)         ON VIOLATION QUARANTINE
  AS SELECT ... FROM STREAM(LIVE.bronze_orders);

The metrics on those expectations become part of the pipeline’s observability surface.

SLOs that survive production

SLO	Target
End-to-end latency P95	< 60 s for “near-real-time” use cases
Drop rate	< 0.5% of input records
Quarantine rate	< 2% of input records
Pipeline uptime	99.9% monthly
Backfill capability	< 24 h for last-7-day reprocessing

These are the right targets to commit to, not the latency benchmarks vendors quote in marketing.

Closing

Streaming on the Lakehouse is operationally feasible when you adopt Auto Loader, DLT, and expectations as the standard pattern. The team’s job becomes monitoring SLOs and reviewing quarantine, not babysitting jobs.

The post Streaming on the Lakehouse: Auto Loader + DLT in Practice appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

Dimensional Modeling on Delta Lake — and When to Choose Data Vault Instead

Zorost Intelligence — Tue, 09 Dec 2025 09:00:00 +0000

Pull-quote: “There is no single right model. There is the right model for the workload.”

Why this matters

Dimensional modeling is a forty-year-old discipline. Lakehouse architecture is a five-year-old discipline. Most teams import their old habits into the new platform and produce models that work but underperform — or models that look modern but break under load.

The right approach is workload-driven.

Four patterns to choose from

Pattern	When to use	Strengths	Weaknesses
Star schema	Reporting and dashboards dominate; Power BI / Tableau is the primary consumer	Familiar; BI-tool friendly; fast slicing on Photon-enabled Delta	Less agile to change; many-to-many requires bridge tables
Data Vault 2.0	Many sources; auditability is required; the model needs to evolve continuously	Auditable; agile; handles many sources; clear separation of business keys, satellites, and links	More tables; queries usually need a presentation layer
One Big Table	API-driven sub-second queries dominate; consumers are applications, not analysts	Sub-second queries; simple semantics for app developers	Joins move into ETL; updates can be expensive
Lakehouse Federation	Cross-system reporting without governance ownership	No data movement; fast to deliver	Performance depends on source; governance has to be explicit

Decision tree

Primary consumer of the model?
   ├── Analysts / BI tools  ──► Star schema (consider Direct Lake)
   ├── Apps / APIs          ──► One Big Table or Star with caching
   ├── Many sources, audit  ──► Data Vault 2.0
   └── Cross-system reporting, no copy possible ──► Lakehouse Federation

How we structure the medallion architecture

Regardless of model pattern, we maintain a Bronze/Silver/Gold separation:

Layer	Purpose	Typical retention
Bronze	Raw + arrival timestamp + source ID; immutable	Long (years)
Silver	Parsed, conformed, deduplicated; data quality enforced	Medium (months to years)
Gold	Business-ready aggregates / dimensions / facts	Short to medium

The model pattern (star, vault, OBT) lives in Gold.

When to mix

Mixing is normal. A typical enterprise customer ends up with:

Data Vault 2.0 for the foundational integration of multiple sources
Star schema in Gold for analytical consumers
One Big Table in Gold for app consumers
Lakehouse Federation for occasional cross-system reporting

Closing

Dimensional modeling on Delta Lake is dimensional modeling, with new physics. Photon, liquid clustering, and Z-order are the storage primitives that change query performance economics. The choice of model still depends on the workload — but the trade-offs are different now than they were a decade ago.

The post Dimensional Modeling on Delta Lake — and When to Choose Data Vault Instead appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

Modernizing ETL: Informatica/ssis/datastage to Lakeflow + DLT

Zorost Intelligence — Tue, 25 Nov 2025 09:00:00 +0000

Pull-quote: “If your modernization plan doesn’t replace the ETL tool, you didn’t modernize. You just changed where the data lands.”

Why this matters

A migration that moves data into Databricks but leaves Informatica running is an incomplete migration. Half the cost and operational pain of legacy stacks lives in the ETL tool — license fees, scheduling brittleness, lineage gaps, and brittle dependencies on legacy connectors.

The right modernization replaces the ETL tool. Lakeflow Declarative Pipelines (DLT), Auto Loader, and Databricks Jobs together cover the full surface area.

The conversion table

Legacy pattern	Lakehouse equivalent
Source-to-stage mapping	Auto Loader (`cloudFiles`) — incremental, schema-evolving, exactly-once
Slowly Changing Dimension Type 1	DLT `apply_changes` with `STORED AS SCD TYPE 1`
SCD Type 2 with effective dating	DLT `apply_changes` with `STORED AS SCD TYPE 2`
Aggregations & roll-ups	Materialized views in Databricks SQL
Workflow scheduling	Databricks Jobs with retries, alerts, lineage
Data quality rules	DLT expectations with quarantine and metric capture
Custom logging & audit	Unity Catalog lineage + `audit_logs`
Reusable transformations	DLT pipelines with shared notebooks/libraries

A reference DLT pipeline

                  ┌────────────────────────────┐
   Cloud Storage ──►│ Auto Loader (schema evol.) │──► Bronze
                  └────────────────────────────┘
                                                      │
                          DLT expectations           ▼
                          (drop / quarantine)    Silver
                                                      │
                          Aggregations / joins        ▼
                                                  Gold

How we treat data quality

Data quality is part of the pipeline, not bolted on after. Every Silver table has DLT expectations that:

Drop obviously bad rows (null business keys, malformed dates)
Quarantine suspicious rows (range violations, referential gaps) for review
Capture metrics so dashboards show data-quality trends, not just data volume

Quality is a first-class output of the pipeline. The data team monitors it like they monitor latency.

Migration sequence

Phase	Output
Inventory	Mappings, jobs, sessions, schedules, lineage gaps
Pattern library	Templates for the top 8–12 conversion patterns in your stack
Iteration 1 (highest-volume sources)	First migrated DLT pipelines · parallel run
Iterations 2–N	Wave-by-wave conversion with parallel run, cutover, decommission
Hyper-care	30/60/90 day stabilization

Closing

ETL modernization done right replaces the legacy tool, not just the destination. Lakeflow + DLT + Auto Loader covers the full surface. The savings are measurable in license fees, operational toil, and time-to-insight.

The post Modernizing ETL: Informatica/ssis/datastage to Lakeflow + DLT appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

OBIEE to Databricks: a Practical Migration Pattern

Zorost Intelligence — Tue, 04 Nov 2025 09:00:00 +0000

Pull-quote: “The RPD is not a black box. It is a graph of joins, hierarchies, and security predicates. Treat it that way and migration becomes tractable.”

Why this matters

Oracle BI EE is one of the most widely deployed enterprise BI platforms. It also has accumulated technical debt — schema drift, layered RPDs, undocumented session variables, and report logic split between the BMM and the report itself. Most “migration” projects start by trying to lift-and-shift everything, get blocked, and stall.

The right approach is methodological. The RPD is treatable as three layers, each of which has a clean Databricks SQL equivalent.

The three-layer translation

   OBIEE                                Databricks
   ─────                                ──────────
   Physical layer    ─────►  Delta Lake tables in Unity Catalog
                              + Lakehouse Federation for live sources

   BMM (logical)     ─────►  Databricks SQL semantic model
                              (Lakehouse views with row/column security)

   Presentation      ─────►  Power BI / Tableau on Databricks SQL
                              (dimensions, measures, time intelligence)

Migration sequence

Phase	Length (typical)	Output
1. Discovery	2–4 wks	Catalog of subject areas, RPDs, repositories, presentation catalogs · usage telemetry · report criticality matrix
2. Source mapping	2–4 wks	Mapping of physical layer to landing tables in Bronze/Silver Delta · federated sources documented
3. Semantic model design	4–8 wks	Logical-to-Databricks-SQL semantic model with row/column security
4. ETL conversion	parallel with 3	Native ETL → Lakeflow / DLT / Spark with DLT expectations
5. Report rebuild	4–10 wks	Top reports rebuilt in Power BI Direct Lake or Tableau
6. Cutover & decom.	2–6 wks	Parallel run · UAT · sign-off · legacy decommissioning
7. Hyper-care	30/60/90 days	Stabilization with SLA-backed support

Security translation

OBIEE security primitive	Databricks equivalent
Application Roles	Unity Catalog groups (Entra/IDP-mapped)
Data filters on logical tables	Dynamic views with `current_user()` and `is_member()`
Column-level filters	`mask()` functions in dynamic views
Session variables	Catalog-scoped configuration tables
Init blocks	Replaced by IDP/Entra group claims

Common pitfalls

Trying to lift-and-shift the BMM. Some logic in the BMM is workaround for OBIEE limitations. Rebuild as Lakehouse views; don’t translate one-for-one.
Skipping usage telemetry. Half the reports in a typical OBIEE deployment are unused. Don’t migrate them.
Translating session variables literally. Most session variables become dynamic-view predicates or IDP claims.
Building the semantic model in Power BI instead of Databricks SQL. Power BI imports work in the short term and create future modernization debt. Direct Lake is the target.

Closing

The OBIEE → Databricks migration pattern is reproducible when you treat the RPD as a graph of joins, hierarchies, and security predicates rather than as a black box. The result is a cleaner semantic model on a platform that supports SQL, ML, streaming, and agentic AI — instead of a single-purpose BI server.

The post OBIEE to Databricks: a Practical Migration Pattern appeared first on Zorost Intelligence | AI, Cloud & Data Experts.