Contacts
Get in touch
Close

Contacts

USA, Washington D.C

+ (1) 240-380-7545

info@zorost.com

Pull-quote: “Cost optimization is not a one-time project. It’s a recurring discipline. The tooling is there. The discipline is the ask.”

Why this matters

Most Databricks deployments have 30–60% slack in their spend within twelve months of go-live. Some of it is unavoidable (early-stage discovery). Some of it is technical (file layout, cluster sizing). Most of it is organizational (no cost ownership, no tagging, no review cadence).

Where the real savings are

Lever Typical impact
Right-sized cluster types (Photon, autoscaling, spot) 15–30%
Job orchestration (concurrent runs, dependencies, retries) 5–15%
File compaction (OPTIMIZE, Z-ORDER, liquid clustering) 10–25% on read-heavy workloads
Caching strategies (Delta cache, query cache) 5–15%
Workload migration to Serverless SQL where appropriate 10–25%
BI semantic-model rationalization 10–20% on Power BI / Tableau queries
Autoscaling thresholds 5–10%
Tombstone management (VACUUM) Cleanup, not a direct saving, but sustainable

Ranges are typical for engagements where the team has not previously focused on cost. Mature deployments have less to find.

Tagging and ownership — the prerequisite

Without tagging, you can’t optimize. Required tags:

  • cost_center
  • environment (dev / stage / prod)
  • owner (team or person)
  • workload (training / serving / ETL / BI / ad-hoc)

These flow into the system tables for cost reporting (system.billing.usage).

The audit, in twelve hours

A typical audit takes about twelve hours of senior engineering time:

  1. Pull system.billing.usage for the last 90 days, joined with cluster metadata
  2. Identify the top 10 jobs by cost
  3. For each, evaluate: is the cluster the right type? Is autoscaling tuned? Are files compacted? Is the workload running at the right cadence?
  4. Identify candidates for serverless migration
  5. Identify candidates for materialized view replacement
  6. Produce a prioritized list with estimated savings

Most teams find five to ten actions that together deliver 20–40% savings.

Common findings

  • A nightly batch job using a high-end cluster size when a Photon-enabled smaller cluster would do
  • A streaming pipeline running with a cluster sized for peak when traffic is bimodal
  • A Power BI model importing 80% of data that nobody queries
  • A SELECT * materialized in a downstream view, doubling storage cost on a hot dataset
  • An ad-hoc cluster left running over a weekend

Cost ownership cadence

The discipline that holds savings: monthly cost review with the data leadership and the FinOps lead. Each owner explains anomalies. Tags get fixed. Wasteful patterns get retired.

Closing

Cost optimization on Databricks is not a one-time project. It is a recurring discipline backed by tagging, system tables, and a monthly review. The platform tooling is there. The discipline is the ask.