Pull-quote: “Saying ‘weather correlates with delays’ is not an operational claim. Saying ‘an upstream weather event caused 32 ± 6 minutes of average delay through a specific ATC mechanism — with an E-value of 1.9 — is.”
Why this matters
Aviation operations centers run on correlations. Weather correlates with delay. Connecting traffic correlates with delay. Crew availability correlates with delay. Every dashboard in the industry shows you which inputs associate with disruption.
But operational decisions are causal decisions. If we cancel three flights at this hub now, what will the cascade look like in three hours? That is not a correlation question. It is a counterfactual question. To answer it credibly, you need a structural model — not a regression dashboard.
What we built
AeroFarr’s causal layer is built on DoWhy (Microsoft Research) and EconML. It produces three classes of output for any operational question:
- Average Treatment Effect (ATE) and Conditional Average Treatment Effect (CATE) — the average causal effect of an intervention, optionally conditional on subgroup features
- Counterfactual estimates via do-calculus — what would happen if we changed a specific variable, holding everything else constant
- Sensitivity analysis — E-values, Austen plots, and Rosenbaum bounds quantifying how much unmeasured confounding would be needed to overturn the conclusion
The headline architectural decision is to keep the causal model separate from the prediction model. The prediction core (a multi-head stacked ensemble) tells you what is likely to happen. The causal layer tells you why. Different problems, different methodologies, deliberately decoupled.
Why sensitivity analysis is the heart of it
A causal claim without sensitivity analysis is a marketing claim. The classic critique is: “What if there’s an unmeasured confounder?” Sensitivity analysis answers that critique numerically. An E-value of 1.9 says: an unmeasured confounder would need to have a relative association of at least 1.9 with both the treatment and the outcome to overturn the conclusion. Operational stakeholders can decide whether that is plausible in their environment.
This is the same standard you would expect from a peer-reviewed epidemiological paper. We hold our operational claims to it.
The operational pattern
A typical operational session uses the causal layer in three steps:
- Identify the question. “Why did the disruption at hub X spread north today?”
- Identify the candidate causal mechanism. “Was it weather acting through ATC ground-stops, or was it crew positioning?”
- Run the analysis. AeroFarr returns the estimated effect, the prediction interval, and the sensitivity analysis — and it returns the safety reports that match the pattern from the RAG layer.
Operations leaders get an answer with a confidence band, a stated mechanism, and a sensitivity result. That is the standard operational decision-support should meet.
What this is not
Causal AI is not a substitute for prediction. AeroFarr’s ensemble — gate / severity / regression trio / quantile / non-linear meta — does the prediction work. Causal AI is a complement: it explains and quantifies the why that the prediction model cannot articulate.
It is also not a free lunch. Identification (what’s actually identifiable from the data) and assumptions (no unmeasured confounders, correct DAG, ignorability) are all live questions. We address them with explicit DAGs, sensitivity analysis, and documented limitations.
Closing
Operations decisions are causal decisions. Treating them with correlation tools and headline accuracy numbers is a category error. The decade in front of us is the decade of operational causal AI — and aviation is one of the domains best suited to it, because the data exists in volume and the questions are unambiguous.


