Databricks Cost Optimization & Finops: Where the Real Savings Are - Zorost Intelligence

Pull-quote: “Cost optimization is not a one-time project. It’s a recurring discipline. The tooling is there. The discipline is the ask.”

Why this matters

Most Databricks deployments have 30–60% slack in their spend within twelve months of go-live. Some of it is unavoidable (early-stage discovery). Some of it is technical (file layout, cluster sizing). Most of it is organizational (no cost ownership, no tagging, no review cadence).

Where the real savings are

Lever	Typical impact
Right-sized cluster types (Photon, autoscaling, spot)	15–30%
Job orchestration (concurrent runs, dependencies, retries)	5–15%
File compaction (`OPTIMIZE`, `Z-ORDER`, `liquid clustering`)	10–25% on read-heavy workloads
Caching strategies (Delta cache, query cache)	5–15%
Workload migration to Serverless SQL where appropriate	10–25%
BI semantic-model rationalization	10–20% on Power BI / Tableau queries
Autoscaling thresholds	5–10%
Tombstone management (`VACUUM`)	Cleanup, not a direct saving, but sustainable

Ranges are typical for engagements where the team has not previously focused on cost. Mature deployments have less to find.

Tagging and ownership — the prerequisite

Without tagging, you can’t optimize. Required tags:

cost_center
environment (dev / stage / prod)
owner (team or person)
workload (training / serving / ETL / BI / ad-hoc)

These flow into the system tables for cost reporting (system.billing.usage).

The audit, in twelve hours

A typical audit takes about twelve hours of senior engineering time:

Pull system.billing.usage for the last 90 days, joined with cluster metadata
Identify the top 10 jobs by cost
For each, evaluate: is the cluster the right type? Is autoscaling tuned? Are files compacted? Is the workload running at the right cadence?
Identify candidates for serverless migration
Identify candidates for materialized view replacement
Produce a prioritized list with estimated savings

Most teams find five to ten actions that together deliver 20–40% savings.

Common findings

A nightly batch job using a high-end cluster size when a Photon-enabled smaller cluster would do
A streaming pipeline running with a cluster sized for peak when traffic is bimodal
A Power BI model importing 80% of data that nobody queries
A SELECT * materialized in a downstream view, doubling storage cost on a hot dataset
An ad-hoc cluster left running over a weekend

Cost ownership cadence

The discipline that holds savings: monthly cost review with the data leadership and the FinOps lead. Each owner explains anomalies. Tags get fixed. Wasteful patterns get retired.

Closing

Cost optimization on Databricks is not a one-time project. It is a recurring discipline backed by tagging, system tables, and a monthly review. The platform tooling is there. The discipline is the ask.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Why this matters

Where the real savings are

Tagging and ownership — the prerequisite

The audit, in twelve hours

Common findings

Cost ownership cadence

Closing

Related

Recent posts

Archive

Tags

Transformative Consulting for Cloud, AI, and Beyond.

Recent comments

Company

Services