Databricks Cost Optimization & Finops: Where the Real Savings Are

Zorost Intelligence — Tue, 21 Apr 2026 09:00:00 +0000

Pull-quote: “Cost optimization is not a one-time project. It’s a recurring discipline. The tooling is there. The discipline is the ask.”

Why this matters

Most Databricks deployments have 30–60% slack in their spend within twelve months of go-live. Some of it is unavoidable (early-stage discovery). Some of it is technical (file layout, cluster sizing). Most of it is organizational (no cost ownership, no tagging, no review cadence).

Where the real savings are

Lever	Typical impact
Right-sized cluster types (Photon, autoscaling, spot)	15–30%
Job orchestration (concurrent runs, dependencies, retries)	5–15%
File compaction (`OPTIMIZE`, `Z-ORDER`, `liquid clustering`)	10–25% on read-heavy workloads
Caching strategies (Delta cache, query cache)	5–15%
Workload migration to Serverless SQL where appropriate	10–25%
BI semantic-model rationalization	10–20% on Power BI / Tableau queries
Autoscaling thresholds	5–10%
Tombstone management (`VACUUM`)	Cleanup, not a direct saving, but sustainable

Ranges are typical for engagements where the team has not previously focused on cost. Mature deployments have less to find.

Tagging and ownership — the prerequisite

Without tagging, you can’t optimize. Required tags:

cost_center
environment (dev / stage / prod)
owner (team or person)
workload (training / serving / ETL / BI / ad-hoc)

These flow into the system tables for cost reporting (system.billing.usage).

The audit, in twelve hours

A typical audit takes about twelve hours of senior engineering time:

Pull system.billing.usage for the last 90 days, joined with cluster metadata
Identify the top 10 jobs by cost
For each, evaluate: is the cluster the right type? Is autoscaling tuned? Are files compacted? Is the workload running at the right cadence?
Identify candidates for serverless migration
Identify candidates for materialized view replacement
Produce a prioritized list with estimated savings

Most teams find five to ten actions that together deliver 20–40% savings.

Common findings

A nightly batch job using a high-end cluster size when a Photon-enabled smaller cluster would do
A streaming pipeline running with a cluster sized for peak when traffic is bimodal
A Power BI model importing 80% of data that nobody queries
A SELECT * materialized in a downstream view, doubling storage cost on a hot dataset
An ad-hoc cluster left running over a weekend

Cost ownership cadence

The discipline that holds savings: monthly cost review with the data leadership and the FinOps lead. Each owner explains anomalies. Tags get fixed. Wasteful patterns get retired.

Closing

Cost optimization on Databricks is not a one-time project. It is a recurring discipline backed by tagging, system tables, and a monthly review. The platform tooling is there. The discipline is the ask.

The post Databricks Cost Optimization & Finops: Where the Real Savings Are appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

OBIEE to Databricks: a Practical Migration Pattern

Zorost Intelligence — Tue, 04 Nov 2025 09:00:00 +0000

Pull-quote: “The RPD is not a black box. It is a graph of joins, hierarchies, and security predicates. Treat it that way and migration becomes tractable.”

Why this matters

Oracle BI EE is one of the most widely deployed enterprise BI platforms. It also has accumulated technical debt — schema drift, layered RPDs, undocumented session variables, and report logic split between the BMM and the report itself. Most “migration” projects start by trying to lift-and-shift everything, get blocked, and stall.

The right approach is methodological. The RPD is treatable as three layers, each of which has a clean Databricks SQL equivalent.

The three-layer translation

   OBIEE                                Databricks
   ─────                                ──────────
   Physical layer    ─────►  Delta Lake tables in Unity Catalog
                              + Lakehouse Federation for live sources

   BMM (logical)     ─────►  Databricks SQL semantic model
                              (Lakehouse views with row/column security)

   Presentation      ─────►  Power BI / Tableau on Databricks SQL
                              (dimensions, measures, time intelligence)

Migration sequence

Phase	Length (typical)	Output
1. Discovery	2–4 wks	Catalog of subject areas, RPDs, repositories, presentation catalogs · usage telemetry · report criticality matrix
2. Source mapping	2–4 wks	Mapping of physical layer to landing tables in Bronze/Silver Delta · federated sources documented
3. Semantic model design	4–8 wks	Logical-to-Databricks-SQL semantic model with row/column security
4. ETL conversion	parallel with 3	Native ETL → Lakeflow / DLT / Spark with DLT expectations
5. Report rebuild	4–10 wks	Top reports rebuilt in Power BI Direct Lake or Tableau
6. Cutover & decom.	2–6 wks	Parallel run · UAT · sign-off · legacy decommissioning
7. Hyper-care	30/60/90 days	Stabilization with SLA-backed support

Security translation

OBIEE security primitive	Databricks equivalent
Application Roles	Unity Catalog groups (Entra/IDP-mapped)
Data filters on logical tables	Dynamic views with `current_user()` and `is_member()`
Column-level filters	`mask()` functions in dynamic views
Session variables	Catalog-scoped configuration tables
Init blocks	Replaced by IDP/Entra group claims

Common pitfalls

Trying to lift-and-shift the BMM. Some logic in the BMM is workaround for OBIEE limitations. Rebuild as Lakehouse views; don’t translate one-for-one.
Skipping usage telemetry. Half the reports in a typical OBIEE deployment are unused. Don’t migrate them.
Translating session variables literally. Most session variables become dynamic-view predicates or IDP claims.
Building the semantic model in Power BI instead of Databricks SQL. Power BI imports work in the short term and create future modernization debt. Direct Lake is the target.

Closing

The OBIEE → Databricks migration pattern is reproducible when you treat the RPD as a graph of joins, hierarchies, and security predicates rather than as a black box. The result is a cleaner semantic model on a platform that supports SQL, ML, streaming, and agentic AI — instead of a single-purpose BI server.

The post OBIEE to Databricks: a Practical Migration Pattern appeared first on Zorost Intelligence | AI, Cloud & Data Experts.

Databricks Archives - Zorost Intelligence | AI, Cloud & Data Experts

Databricks Cost Optimization & Finops: Where the Real Savings Are

Why this matters

Where the real savings are

Tagging and ownership — the prerequisite

The audit, in twelve hours

Common findings

Cost ownership cadence

Closing

OBIEE to Databricks: a Practical Migration Pattern

Why this matters

The three-layer translation

Migration sequence

Security translation

Common pitfalls

Closing