Causality in the Age of AI Scaling

Tangier, Morocco

Workshop Date: 5 May 2026

Submission Deadline: ~~27 February~~ 3 March 2026 (AOE)

About the workshop

Reasoning about interventions, the core of causality, is fundamental to solving many of modern AI's most pressing challenges, including trustworthiness, reliability, explainability, and out-of-distribution generalization. Yet, recent AI breakthroughs have been overwhelmingly driven by scaling models on simple predictive objectives without explicit causal modeling, such as next-token prediction for Large Language Models or denoising prediction for diffusion models. This success raises a critical question for the community: Can causal abilities emerge from scale alone, and if not, what can explicit causal modeling bring that scale cannot? This workshop aims to understand this question and explore the potential synergy between scaling predictive methods and formal causal modeling to build the next generation of AI.

Workshop Goals

Can causal abilities emerge from scale alone, and if not, what can explicit causal modeling bring that scale cannot?
Is scaling sufficient for building intelligent systems? If not, is causal reasoning needed? What are the limitations and opportunities of scaling AI models?
What are the challenges in developing scalable causal algorithms that can be potentially integrated into existing AI models?
What ingredients are necessary to design robust, interactive world models?

Schedule

Time	Activity
9:00 - 9:10	Welcome & Opening
9:10 - 9:40	Invited Talk 1 - Mihaela van der Schaar
9:40 - 10:10	Invited Talk 2 - Alexander D'Amour
10:10 - 10:40	Morning Break
10:45 - 11:00	Contributed Talks Causal Inference with Time Series Foundation Models Cyrus Illick, Saeyoung Rho, Vishal Misra Towards Understanding Out-of-Distribution Generalization for In-Context Learning via Low-Dimensional Subspaces Soo Min Kwon, Alec S. Xu, Can Yaras, Laura Balzano, Qing Qu
11:00 - 12:30	Poster Session I
12:30 - 2:00	Lunch
2:00 - 2:30	Invited Talk 3 - Francesco Locatello
2:30 - 3:00	Invited Talk 4 - Sara Magliacane
3:00 - 3:15	Contributed Talks Scalable Policy Maximization Under Network Interference Aidan Gleich, Eric Laber, Alexander Volfovsky Masking Unfairness: Hiding Causality within Zero ATE Zou Yang, Sophia Xiao, Bijan Mazaheri
3:15 - 4:30	Poster Session II
4:30 - 5:00	Invited Talk 5 - Sanmi Koyejo
5:00 - 5:55	Panel Discussion - Alexander D'Amour, Francesco Locatello, Sara Magliacane, Chandler Squires
5:55 - 6:00	Closing Remarks

Keynote Speakers

Mihaela van der Schaar

University of Cambridge

Sanmi Koyejo

Stanford University

Francesco Locatello

Institute of Science and Technology Austria

Sara Magliacane

University of Amsterdam

Alexander D'Amour

Google DeepMind

Invited Talks

Mihaela van der Schaar (University of Cambridge)

Title: Scaling Causal Reasoning

Causal reasoning requires answering interventional and counterfactual questions about the mechanisms that generate data—not just predicting what comes next. Yet today’s large-scale machine learning systems struggle precisely where this capability matters most: in complex, real-world settings where causal structure is large, only partially specified, and rarely accompanied by ground-truth supervision. This creates a fundamental bottleneck: how can we scale causal reasoning when both the underlying systems and the signals needed to learn them are inherently incomplete? In this talk, I will argue that the core challenge is not simply one of model capacity, but of representation and supervision. Real-world causal systems must be formalized, queried, and stress-tested in ways that current learning paradigms do not naturally support. I will outline a new perspective on scaling causal reasoning—one that reframes the problem as constructing and leveraging causal simulators capable of supporting interventional and counterfactual queries at scale. This perspective opens the door to new forms of supervision, evaluation, and generalization for causal reasoning in machine learning. This is joint work with my students Anita Kriz and Nicolas Astorga.

Sanmi Koyejo (Stanford University)

Title: AI Measurement is a Causal Inference Problem

Scaling laws are the field's main forecasting tool — but they are observational. What we want is interventional: the causal effect of changing compute or data. Getting there requires solving a harder problem first: our measurements are confounded. Item difficulty, contributor practices, and evaluation protocol all shape benchmark scores in ways that go unmeasured. Applying measurement models to large-scale LLM leaderboards reveals that contributor effects explain more ranking variance than model architecture, and that reliability degrades precisely at the frontier where it matters most. I argue that measurement modeling and interventional scaling laws are central open problems in AI evaluation, and that causal inference is the right language for both.

Francesco Locatello (Institute of Science and Technology Austria)

Title: Causal Inference in Scientific Experiments with Large Models

Deciphering raw, high-dimensional, and temporal observations into causal knowledge is a key component of the scientific discovery process and a longstanding challenge for AI. Across scientific disciplines, the data that can be recorded do not directly expose causal variables, which often remain latent and only indirectly measured. In this talk, I present our recent work on accurate causal effect estimation from raw experimental data using deep learning and its interdisciplinary applications. I begin by defining when a predictor constitutes a causally valid proxy of a latent variable, and how deep learning models can process entire experiments to yield correct causal conclusions. I then show how AI models enable “looking at the data first” and discovering treatment effects without supervision.

Sara Magliacane (University of Amsterdam)

Title: Scalable Causal Discovery for Statistically Efficient Causal Inference

Causal discovery methods can identify valid adjustment sets for causal effect estimation for a small set of target variables, even when the underlying causal graph is unknown. Global causal discovery methods focus on learning the whole causal graph and therefore enable the recovery of optimal adjustment sets, i.e., sets with the lowest asymptotic variance, but they quickly become computationally prohibitive as the number of variables grows. Local causal discovery methods offer a more scalable alternative by focusing on the local neighbourhood of the target variables, but they are restricted to statistically suboptimal adjustment sets.

In this talk, I will present two recent methods that combine the computational efficiency of local methods with the statistical optimality of global causal discovery methods. First, I will describe the Sequential Non-Ancestor Pruning (SNAP) framework (arxiv.org/abs/2502.07857). SNAP progressively identifies and prunes definite non-ancestors of the target variables during the causal discovery process. We show that the resulting subgraph is sufficient for identifying the causal relations between the targets and their efficient adjustment sets. Then, I will introduce Local Optimal Adjustments Discovery (LOAD) (arxiv.org/abs/2502.07857), a method for identifying optimal adjustment sets from local information. As a first step, LOAD identifies the causal relation between the targets and tests if the causal effect is identifiable by using only local information. If it is identifiable, it then finds the possible descendants of the treatment and infers the optimal adjustment set as the parents of the outcome in a modified forbidden projection. Otherwise, it returns the locally valid parent adjustment sets based on the learned local structure. For both methods, I will show that on our evaluation they outperform global methods in scalability, while providing more accurate effect estimation than local methods.

Alexander D'Amour (Google DeepMind)

Title: Synthetic Experiments with Generative AI are Secretly Observational Studies

LLMs (large language models) and other generative AI models have shown promise for simulating natural phenomena. For example, LLMs are increasingly used to simulate users of interactive systems, such as conversational AI agents. In these cases, an LLM is initialized with a persona and instructions to play the role of that person; the LLM then interacts with the system, and often generates plausible user interactions. This setup gives the impression that the generative AI model can operate as a structural model: with the model as a user simulator, it seems that we can generate counterfactual outcomes by intervening on the simulation setting to answer causal questions. In this talk, we show a wrinkle in this story: although generative models provide a simulator-like interface, the data they generate is confounded. We argue that this confounding stems from generative models operating as intended, and explore how causal adjustment strategies can begin to address internal validity concerns. The work raises new questions in the broader conversation about how generative models can and cannot be used as causal world models.

Accepted Papers

You can access the PDFs of accepted papers on OpenReview.

Causal Sparse Concepts for Faithful Explanations of Large Models
Khalid Oublal, Quentin Bouniot, Qi Gan, Stephan Clémençon, Zeynep Akata
Does Persona Change Reasoning? A Causal Mediation Analysis of System Prompt Interventions
Aravilli Atchuta Ram
Towards Understanding Out-of-Distribution Generalization for In-Context Learning via Low-Dimensional Subspaces
Soo Min Kwon, Alec S. Xu, Can Yaras, Laura Balzano, Qing Qu
Scalable Policy Maximization Under Network Interference
Aidan Gleich, Eric Laber, Alexander Volfovsky
Evaluation of Large Language Models via Coupled Token Generation
Nina L. Corvelo Benz, Stratis Tsirtsis, Eleni Straitouri, Ivi Chatzi, Ander Artola Velasco, Suhas Thejaswi, Manuel Gomez Rodriguez
Causal In-Context Learning in Transformers: Training Dynamics Across Heterogeneous Interventional Data
Shanyun Gao, Murat Kocaoglu, Qifan Song
Towards Understanding When Causal Structure Improves Robustness: Evidence from Generative Models
Manal Benhamza, Marianne Clausel, Myriam Tami
Neural Effect Modifier Search
Riccardo Cadei, Falco J. Bargagli-Stoffi, Francesco Locatello
Evaluating Counterfactual Data Augmentation in Reinforcement Learning
Shilpa Noushad, Sajan Kumar, Pratyush Uppuluri
Same Meaning, Different Tokens: Tokenization-Induced Shifts in Representations and Predictions
Anthony Ragazzi, Eugene Santos
Causal Inference with Time Series Foundation Models
Cyrus Illick, Saeyoung Rho, Vishal Misra
Provable Robustness to Spurious Correlations via Invariant Data For Robust Finetuning
Ruqi Bai, Yao Ji, Mingyu Kim, Easton Currie, Zeyu Zhou, David I. Inouye
An Empirical Evaluation of Model Completion for Causal Inference
Jiapeng Zhao, Elias Bareinboim, Rina Dechter
Causal Discovery Beyond Scaling: Mixed-Type DAG Learning with Native Missing-Data Inference
Amine M'Charrak, Tommaso Salvatori, Thomas Lukasiewicz
Feature-Conditioned Causal Temporal Representation Learning for Human Motion-Inspired Dynamics
Linghao Zeng, Alina Glushkova, Sotiris Manitsaris
CauScale: Neural Causal Discovery at Scale
Bo Peng, Sirui Chen, Jiaguo Tian, Yu Qiao, Chaochao Lu
Causal Reasoning in Pieces: Modular In-Context Learning for Causal Discovery
Kacper Kadziolka, Saber Salehkaleybar
Intervention-Based Stability as a Reliability Signal in Federated Graph Learning
Yashmi Kumarasiri
On the identifiability of causal graphs with multiple environments
Francesco Montagna
Demystifying amortized causal discovery with transformers
Francesco Montagna, Max Cairney-Leeming, Dhanya Sridhar, Francesco Locatello
Masking Unfairness: Hiding Causality within Zero ATE
Zou Yang, Sophia Xiao, Bijan Mazaheri
Scalable Neural Synthetic Control with Individual Counterfactuals under Hidden Confounding
Maha Ouali, Badih Ghattas, Emmanuel Flachaire, Philippe Charpentier, Bozzi Laurent

Call for papers

We invite submissions exploring the synergy between scaling predictive methods and causal modeling to build the next generation of trustworthy and reliable AI.

Topics

Potential topics include, but are not limited to:

Emergence of causal abilities in foundation models (or the failure thereof)
OOD generalization and robustness of large models
Scaling causal generative modeling and representation learning
Causal, counterfactual, and logical reasoning in large models
Design of interactive causal world models
Trustworthy and interpretable AI
Causal discovery and abstraction (especially applied to AI)
Evaluation and benchmarking (and the limitations thereof)

Submission

We invite submissions of short papers presenting recent work on scaling and causality. Submissions are now being accepted through OpenReview.

Submissions should be formatted using the AISTATS LaTeX style. Papers are limited to 4 pages (excluding references and appendices). Accepted contributions will be presented as posters during the workshop. We will select a small number of contributed talks from the accepted submissions for short oral presentations at the workshop.

Submissions under review or accepted within the past year at other venues are allowed. All accepted papers are non-archival and will be made publicly available on OpenReview. Authors should create an OpenReview Profile at least two weeks in advance of the paper submission deadline.

Important dates

Submission deadline: ~~February 27, 2026~~ March 3, 2026 (extended!) (Anywhere on Earth)
Notification of acceptance: March 18, 2026 (Anywhere on Earth)
Workshop date: May 5, 2026

Organizers

David Inouye

Purdue University

Bryon Aragam

University of Chicago

Murat Kocaoglu

Johns Hopkins University