Contacts
Get in touch
Close

Contacts

USA, Washington D.C

+ (1) 240-380-7545

info@zorost.com

Pull-quote: “If your modernization plan doesn’t replace the ETL tool, you didn’t modernize. You just changed where the data lands.”

Why this matters

A migration that moves data into Databricks but leaves Informatica running is an incomplete migration. Half the cost and operational pain of legacy stacks lives in the ETL tool — license fees, scheduling brittleness, lineage gaps, and brittle dependencies on legacy connectors.

The right modernization replaces the ETL tool. Lakeflow Declarative Pipelines (DLT), Auto Loader, and Databricks Jobs together cover the full surface area.

The conversion table

Legacy pattern Lakehouse equivalent
Source-to-stage mapping Auto Loader (cloudFiles) — incremental, schema-evolving, exactly-once
Slowly Changing Dimension Type 1 DLT apply_changes with STORED AS SCD TYPE 1
SCD Type 2 with effective dating DLT apply_changes with STORED AS SCD TYPE 2
Aggregations & roll-ups Materialized views in Databricks SQL
Workflow scheduling Databricks Jobs with retries, alerts, lineage
Data quality rules DLT expectations with quarantine and metric capture
Custom logging & audit Unity Catalog lineage + audit_logs
Reusable transformations DLT pipelines with shared notebooks/libraries

A reference DLT pipeline

                  ┌────────────────────────────┐
   Cloud Storage ──►│ Auto Loader (schema evol.) │──► Bronze
                  └────────────────────────────┘
                                                      │
                          DLT expectations           ▼
                          (drop / quarantine)    Silver
                                                      │
                          Aggregations / joins        ▼
                                                  Gold

How we treat data quality

Data quality is part of the pipeline, not bolted on after. Every Silver table has DLT expectations that:

  • Drop obviously bad rows (null business keys, malformed dates)
  • Quarantine suspicious rows (range violations, referential gaps) for review
  • Capture metrics so dashboards show data-quality trends, not just data volume

Quality is a first-class output of the pipeline. The data team monitors it like they monitor latency.

Migration sequence

Phase Output
Inventory Mappings, jobs, sessions, schedules, lineage gaps
Pattern library Templates for the top 8–12 conversion patterns in your stack
Iteration 1 (highest-volume sources) First migrated DLT pipelines · parallel run
Iterations 2–N Wave-by-wave conversion with parallel run, cutover, decommission
Hyper-care 30/60/90 day stabilization

Closing

ETL modernization done right replaces the legacy tool, not just the destination. Lakeflow + DLT + Auto Loader covers the full surface. The savings are measurable in license fees, operational toil, and time-to-insight.