<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Databricks Archives - Zorost Intelligence | AI, Cloud &amp; Data Experts</title>
	<atom:link href="https://zorost.com/tag/databricks/feed/" rel="self" type="application/rss+xml" />
	<link>https://zorost.com/tag/databricks/</link>
	<description>Production AI systems for aviation, manufacturing, pharma, government, finance, freight, and geopolitical intelligence.</description>
	<lastBuildDate>Wed, 20 May 2026 18:42:32 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=7.0</generator>

<image>
	<url>https://zorost.com/wp-content/uploads/2025/08/ZOROST-Intel-Logo3_512-150x150.png</url>
	<title>Databricks Archives - Zorost Intelligence | AI, Cloud &amp; Data Experts</title>
	<link>https://zorost.com/tag/databricks/</link>
	<width>32</width>
	<height>32</height>
</image> 
<site xmlns="com-wordpress:feed-additions:1">81719879</site>	<item>
		<title>Databricks Cost Optimization &#038; Finops: Where the Real Savings Are</title>
		<link>https://zorost.com/databricks-cost-optimization-finops/</link>
		
		<dc:creator><![CDATA[Zorost Intelligence]]></dc:creator>
		<pubDate>Tue, 21 Apr 2026 09:00:00 +0000</pubDate>
				<category><![CDATA[Databricks Modernization]]></category>
		<category><![CDATA[Cost Optimization]]></category>
		<category><![CDATA[Databricks]]></category>
		<category><![CDATA[FinOps]]></category>
		<category><![CDATA[Performance Tuning]]></category>
		<guid isPermaLink="false">https://zorost.com/databricks-cost-optimization-finops/</guid>

					<description><![CDATA[<p>A practical FinOps playbook for Databricks. Cluster types, file compaction, caching, serverless, and BI rationalization — with realistic savings ranges.</p>
<p>The post <a href="https://zorost.com/databricks-cost-optimization-finops/">Databricks Cost Optimization &#038; Finops: Where the Real Savings Are</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></description>
										<content:encoded><![CDATA[<blockquote>
<p><strong>Pull-quote:</strong> &#8220;Cost optimization is not a one-time project. It&#8217;s a recurring discipline. The tooling is there. The discipline is the ask.&#8221;</p>
</blockquote>
<h4>Why this matters</h4>
<p>Most Databricks deployments have 30–60% slack in their spend within twelve months of go-live. Some of it is unavoidable (early-stage discovery). Some of it is technical (file layout, cluster sizing). Most of it is organizational (no cost ownership, no tagging, no review cadence).</p>
<h4>Where the real savings are</h4>
<table>
<thead>
<tr>
<th>Lever</th>
<th>Typical impact</th>
</tr>
</thead>
<tbody>
<tr>
<td>Right-sized cluster types (Photon, autoscaling, spot)</td>
<td>15–30%</td>
</tr>
<tr>
<td>Job orchestration (concurrent runs, dependencies, retries)</td>
<td>5–15%</td>
</tr>
<tr>
<td>File compaction (<code>OPTIMIZE</code>, <code>Z-ORDER</code>, <code>liquid clustering</code>)</td>
<td>10–25% on read-heavy workloads</td>
</tr>
<tr>
<td>Caching strategies (Delta cache, query cache)</td>
<td>5–15%</td>
</tr>
<tr>
<td>Workload migration to Serverless SQL where appropriate</td>
<td>10–25%</td>
</tr>
<tr>
<td>BI semantic-model rationalization</td>
<td>10–20% on Power BI / Tableau queries</td>
</tr>
<tr>
<td>Autoscaling thresholds</td>
<td>5–10%</td>
</tr>
<tr>
<td>Tombstone management (<code>VACUUM</code>)</td>
<td>Cleanup, not a direct saving, but sustainable</td>
</tr>
</tbody>
</table>
<blockquote>
<p>Ranges are typical for engagements where the team has not previously focused on cost. Mature deployments have less to find.</p>
</blockquote>
<h4>Tagging and ownership — the prerequisite</h4>
<p>Without tagging, you can&#8217;t optimize. Required tags:</p>
<ul>
<li><code>cost_center</code></li>
<li><code>environment</code> (dev / stage / prod)</li>
<li><code>owner</code> (team or person)</li>
<li><code>workload</code> (training / serving / ETL / BI / ad-hoc)</li>
</ul>
<p>These flow into the <strong>system tables</strong> for cost reporting (<code>system.billing.usage</code>).</p>
<h4>The audit, in twelve hours</h4>
<p>A typical audit takes about twelve hours of senior engineering time:</p>
<ol>
<li>Pull <code>system.billing.usage</code> for the last 90 days, joined with cluster metadata</li>
<li>Identify the top 10 jobs by cost</li>
<li>For each, evaluate: is the cluster the right type? Is autoscaling tuned? Are files compacted? Is the workload running at the right cadence?</li>
<li>Identify candidates for serverless migration</li>
<li>Identify candidates for materialized view replacement</li>
<li>Produce a prioritized list with estimated savings</li>
</ol>
<p>Most teams find five to ten actions that together deliver 20–40% savings.</p>
<h4>Common findings</h4>
<ul>
<li>A nightly batch job using a high-end cluster size when a Photon-enabled smaller cluster would do</li>
<li>A streaming pipeline running with a cluster sized for peak when traffic is bimodal</li>
<li>A Power BI model importing 80% of data that nobody queries</li>
<li>A <code>SELECT *</code> materialized in a downstream view, doubling storage cost on a hot dataset</li>
<li>An ad-hoc cluster left running over a weekend</li>
</ul>
<h4>Cost ownership cadence</h4>
<p>The discipline that holds savings: monthly cost review with the data leadership and the FinOps lead. Each owner explains anomalies. Tags get fixed. Wasteful patterns get retired.</p>
<h4>Closing</h4>
<p>Cost optimization on Databricks is not a one-time project. It is a recurring discipline backed by tagging, system tables, and a monthly review. The platform tooling is there. The discipline is the ask.</p>
<hr>
<p>The post <a href="https://zorost.com/databricks-cost-optimization-finops/">Databricks Cost Optimization &#038; Finops: Where the Real Savings Are</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">24309</post-id>	</item>
		<item>
		<title>OBIEE to Databricks: a Practical Migration Pattern</title>
		<link>https://zorost.com/obiee-to-databricks-migration-pattern/</link>
		
		<dc:creator><![CDATA[Zorost Intelligence]]></dc:creator>
		<pubDate>Tue, 04 Nov 2025 09:00:00 +0000</pubDate>
				<category><![CDATA[Databricks Modernization]]></category>
		<category><![CDATA[Databricks]]></category>
		<category><![CDATA[Migration]]></category>
		<category><![CDATA[OBIEE]]></category>
		<category><![CDATA[Semantic Layer]]></category>
		<category><![CDATA[Unity Catalog]]></category>
		<guid isPermaLink="false">https://zorost.com/obiee-to-databricks-migration-pattern/</guid>

					<description><![CDATA[<p>Move Oracle OBIEE / OAS to Databricks SQL with a clear semantic-layer methodology. RPD reconstruction, security translation, ETL conversion, and report rebuild — without losing business logic.</p>
<p>The post <a href="https://zorost.com/obiee-to-databricks-migration-pattern/">OBIEE to Databricks: a Practical Migration Pattern</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></description>
										<content:encoded><![CDATA[<blockquote>
<p><strong>Pull-quote:</strong> &#8220;The RPD is not a black box. It is a graph of joins, hierarchies, and security predicates. Treat it that way and migration becomes tractable.&#8221;</p>
</blockquote>
<h4>Why this matters</h4>
<p>Oracle BI EE is one of the most widely deployed enterprise BI platforms. It also has accumulated technical debt — schema drift, layered RPDs, undocumented session variables, and report logic split between the BMM and the report itself. Most &#8220;migration&#8221; projects start by trying to lift-and-shift everything, get blocked, and stall.</p>
<p>The right approach is methodological. The RPD is treatable as three layers, each of which has a clean Databricks SQL equivalent.</p>
<h4>The three-layer translation</h4>
<pre><code>   OBIEE                                Databricks
   ─────                                ──────────
   Physical layer    ─────►  Delta Lake tables in Unity Catalog
                              + Lakehouse Federation for live sources

   BMM (logical)     ─────►  Databricks SQL semantic model
                              (Lakehouse views with row/column security)

   Presentation      ─────►  Power BI / Tableau on Databricks SQL
                              (dimensions, measures, time intelligence)</code></pre>
<h4>Migration sequence</h4>
<table>
<thead>
<tr>
<th>Phase</th>
<th>Length (typical)</th>
<th>Output</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>1. Discovery</strong></td>
<td>2–4 wks</td>
<td>Catalog of subject areas, RPDs, repositories, presentation catalogs · usage telemetry · report criticality matrix</td>
</tr>
<tr>
<td><strong>2. Source mapping</strong></td>
<td>2–4 wks</td>
<td>Mapping of physical layer to landing tables in Bronze/Silver Delta · federated sources documented</td>
</tr>
<tr>
<td><strong>3. Semantic model design</strong></td>
<td>4–8 wks</td>
<td>Logical-to-Databricks-SQL semantic model with row/column security</td>
</tr>
<tr>
<td><strong>4. ETL conversion</strong></td>
<td>parallel with 3</td>
<td>Native ETL → Lakeflow / DLT / Spark with DLT expectations</td>
</tr>
<tr>
<td><strong>5. Report rebuild</strong></td>
<td>4–10 wks</td>
<td>Top reports rebuilt in Power BI Direct Lake or Tableau</td>
</tr>
<tr>
<td><strong>6. Cutover &amp; decom.</strong></td>
<td>2–6 wks</td>
<td>Parallel run · UAT · sign-off · legacy decommissioning</td>
</tr>
<tr>
<td><strong>7. Hyper-care</strong></td>
<td>30/60/90 days</td>
<td>Stabilization with SLA-backed support</td>
</tr>
</tbody>
</table>
<h4>Security translation</h4>
<table>
<thead>
<tr>
<th>OBIEE security primitive</th>
<th>Databricks equivalent</th>
</tr>
</thead>
<tbody>
<tr>
<td>Application Roles</td>
<td>Unity Catalog groups (Entra/IDP-mapped)</td>
</tr>
<tr>
<td>Data filters on logical tables</td>
<td>Dynamic views with <code>current_user()</code> and <code>is_member()</code></td>
</tr>
<tr>
<td>Column-level filters</td>
<td><code>mask()</code> functions in dynamic views</td>
</tr>
<tr>
<td>Session variables</td>
<td>Catalog-scoped configuration tables</td>
</tr>
<tr>
<td>Init blocks</td>
<td>Replaced by IDP/Entra group claims</td>
</tr>
</tbody>
</table>
<h4>Common pitfalls</h4>
<ul>
<li><strong>Trying to lift-and-shift the BMM.</strong> Some logic in the BMM is workaround for OBIEE limitations. Rebuild as Lakehouse views; don&#8217;t translate one-for-one.</li>
<li><strong>Skipping usage telemetry.</strong> Half the reports in a typical OBIEE deployment are unused. Don&#8217;t migrate them.</li>
<li><strong>Translating session variables literally.</strong> Most session variables become dynamic-view predicates or IDP claims.</li>
<li><strong>Building the semantic model in Power BI instead of Databricks SQL.</strong> Power BI imports work in the short term and create future modernization debt. Direct Lake is the target.</li>
</ul>
<h4>Closing</h4>
<p>The OBIEE → Databricks migration pattern is reproducible when you treat the RPD as a graph of joins, hierarchies, and security predicates rather than as a black box. The result is a cleaner semantic model on a platform that supports SQL, ML, streaming, and agentic AI — instead of a single-purpose BI server.</p>
<hr>
<p>The post <a href="https://zorost.com/obiee-to-databricks-migration-pattern/">OBIEE to Databricks: a Practical Migration Pattern</a> appeared first on <a href="https://zorost.com">Zorost Intelligence | AI, Cloud &amp; Data Experts</a>.</p>
]]></content:encoded>
					
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">24300</post-id>	</item>
	</channel>
</rss>
