Causality in the Age of AI Scaling

Tangier, Morocco

Workshop Date: 5 May 2026

Submission Deadline: 27 February 3 March 2026 (AOE)

About the workshop

Reasoning about interventions, the core of causality, is fundamental to solving many of modern AI's most pressing challenges, including trustworthiness, reliability, explainability, and out-of-distribution generalization. Yet, recent AI breakthroughs have been overwhelmingly driven by scaling models on simple predictive objectives without explicit causal modeling, such as next-token prediction for Large Language Models or denoising prediction for diffusion models. This success raises a critical question for the community: Can causal abilities emerge from scale alone, and if not, what can explicit causal modeling bring that scale cannot? This workshop aims to understand this question and explore the potential synergy between scaling predictive methods and formal causal modeling to build the next generation of AI.

Workshop Goals

  • Can causal abilities emerge from scale alone, and if not, what can explicit causal modeling bring that scale cannot?
  • Is scaling sufficient for building intelligent systems? If not, is causal reasoning needed? What are the limitations and opportunities of scaling AI models?
  • What are the challenges in developing scalable causal algorithms that can be potentially integrated into existing AI models?
  • What ingredients are necessary to design robust, interactive world models?

Schedule

Time Activity
9:00 - 9:10Welcome & Opening
9:10 - 9:40Invited Talk 1 - Mihaela van der Schaar
9:40 - 10:10Invited Talk 2 - Alexander D'Amour
10:10 - 10:40Morning Break
10:45 - 11:00Contributed Talks
  • Causal Inference with Time Series Foundation Models
    Cyrus Illick, Saeyoung Rho, Vishal Misra
  • Towards Understanding Out-of-Distribution Generalization for In-Context Learning via Low-Dimensional Subspaces
    Soo Min Kwon, Alec S. Xu, Can Yaras, Laura Balzano, Qing Qu
11:00 - 12:30Poster Session I
12:30 - 2:00Lunch
2:00 - 2:30Invited Talk 3 - Francesco Locatello
2:30 - 3:00Invited Talk 4 - Sara Magliacane
3:00 - 3:15Contributed Talks
  • Scalable Policy Maximization Under Network Interference
    Aidan Gleich, Eric Laber, Alexander Volfovsky
  • Masking Unfairness: Hiding Causality within Zero ATE
    Zou Yang, Sophia Xiao, Bijan Mazaheri
3:15 - 4:30Poster Session II
4:30 - 5:00Invited Talk 5 - Sanmi Koyejo
5:00 - 5:55Panel Discussion - Alexander D'Amour, Francesco Locatello, Sara Magliacane, Chandler Squires
5:55 - 6:00Closing Remarks

Keynote Speakers

Speaker photo placeholder

Mihaela van der Schaar

University of Cambridge
Speaker photo placeholder

Sanmi Koyejo

Stanford University
Speaker photo placeholder

Francesco Locatello

Institute of Science and Technology Austria
Speaker photo placeholder

Sara Magliacane

University of Amsterdam
Speaker photo placeholder

Alexander D'Amour

Google DeepMind

Invited Talks

Mihaela van der Schaar (University of Cambridge)

Title: Scaling Causal Reasoning

Causal reasoning requires answering interventional and counterfactual questions about the mechanisms that generate data—not just predicting what comes next. Yet today’s large-scale machine learning systems struggle precisely where this capability matters most: in complex, real-world settings where causal structure is large, only partially specified, and rarely accompanied by ground-truth supervision. This creates a fundamental bottleneck: how can we scale causal reasoning when both the underlying systems and the signals needed to learn them are inherently incomplete? In this talk, I will argue that the core challenge is not simply one of model capacity, but of representation and supervision. Real-world causal systems must be formalized, queried, and stress-tested in ways that current learning paradigms do not naturally support. I will outline a new perspective on scaling causal reasoning—one that reframes the problem as constructing and leveraging causal simulators capable of supporting interventional and counterfactual queries at scale. This perspective opens the door to new forms of supervision, evaluation, and generalization for causal reasoning in machine learning. This is joint work with my students Anita Kriz and Nicolas Astorga.

Sanmi Koyejo (Stanford University)

Title: AI Measurement is a Causal Inference Problem

Scaling laws are the field's main forecasting tool — but they are observational. What we want is interventional: the causal effect of changing compute or data. Getting there requires solving a harder problem first: our measurements are confounded. Item difficulty, contributor practices, and evaluation protocol all shape benchmark scores in ways that go unmeasured. Applying measurement models to large-scale LLM leaderboards reveals that contributor effects explain more ranking variance than model architecture, and that reliability degrades precisely at the frontier where it matters most. I argue that measurement modeling and interventional scaling laws are central open problems in AI evaluation, and that causal inference is the right language for both.

Francesco Locatello (Institute of Science and Technology Austria)

Title: Causal Inference in Scientific Experiments with Large Models

Deciphering raw, high-dimensional, and temporal observations into causal knowledge is a key component of the scientific discovery process and a longstanding challenge for AI. Across scientific disciplines, the data that can be recorded do not directly expose causal variables, which often remain latent and only indirectly measured. In this talk, I present our recent work on accurate causal effect estimation from raw experimental data using deep learning and its interdisciplinary applications. I begin by defining when a predictor constitutes a causally valid proxy of a latent variable, and how deep learning models can process entire experiments to yield correct causal conclusions. I then show how AI models enable “looking at the data first” and discovering treatment effects without supervision.

Sara Magliacane (University of Amsterdam)

Title: Scalable Causal Discovery for Statistically Efficient Causal Inference

Causal discovery methods can identify valid adjustment sets for causal effect estimation for a small set of target variables, even when the underlying causal graph is unknown. Global causal discovery methods focus on learning the whole causal graph and therefore enable the recovery of optimal adjustment sets, i.e., sets with the lowest asymptotic variance, but they quickly become computationally prohibitive as the number of variables grows. Local causal discovery methods offer a more scalable alternative by focusing on the local neighbourhood of the target variables, but they are restricted to statistically suboptimal adjustment sets.

In this talk, I will present two recent methods that combine the computational efficiency of local methods with the statistical optimality of global causal discovery methods. First, I will describe the Sequential Non-Ancestor Pruning (SNAP) framework (arxiv.org/abs/2502.07857). SNAP progressively identifies and prunes definite non-ancestors of the target variables during the causal discovery process. We show that the resulting subgraph is sufficient for identifying the causal relations between the targets and their efficient adjustment sets. Then, I will introduce Local Optimal Adjustments Discovery (LOAD) (arxiv.org/abs/2502.07857), a method for identifying optimal adjustment sets from local information. As a first step, LOAD identifies the causal relation between the targets and tests if the causal effect is identifiable by using only local information. If it is identifiable, it then finds the possible descendants of the treatment and infers the optimal adjustment set as the parents of the outcome in a modified forbidden projection. Otherwise, it returns the locally valid parent adjustment sets based on the learned local structure. For both methods, I will show that on our evaluation they outperform global methods in scalability, while providing more accurate effect estimation than local methods.

Alexander D'Amour (Google DeepMind)

Title: Synthetic Experiments with Generative AI are Secretly Observational Studies

LLMs (large language models) and other generative AI models have shown promise for simulating natural phenomena. For example, LLMs are increasingly used to simulate users of interactive systems, such as conversational AI agents. In these cases, an LLM is initialized with a persona and instructions to play the role of that person; the LLM then interacts with the system, and often generates plausible user interactions. This setup gives the impression that the generative AI model can operate as a structural model: with the model as a user simulator, it seems that we can generate counterfactual outcomes by intervening on the simulation setting to answer causal questions. In this talk, we show a wrinkle in this story: although generative models provide a simulator-like interface, the data they generate is confounded. We argue that this confounding stems from generative models operating as intended, and explore how causal adjustment strategies can begin to address internal validity concerns. The work raises new questions in the broader conversation about how generative models can and cannot be used as causal world models.

Accepted Papers

You can access the PDFs of accepted papers on OpenReview.

  • Causal Sparse Concepts for Faithful Explanations of Large Models
    Khalid Oublal, Quentin Bouniot, Qi Gan, Stephan Clémençon, Zeynep Akata
  • Does Persona Change Reasoning? A Causal Mediation Analysis of System Prompt Interventions
    Aravilli Atchuta Ram
  • Towards Understanding Out-of-Distribution Generalization for In-Context Learning via Low-Dimensional Subspaces
    Soo Min Kwon, Alec S. Xu, Can Yaras, Laura Balzano, Qing Qu
  • Scalable Policy Maximization Under Network Interference
    Aidan Gleich, Eric Laber, Alexander Volfovsky
  • Evaluation of Large Language Models via Coupled Token Generation
    Nina L. Corvelo Benz, Stratis Tsirtsis, Eleni Straitouri, Ivi Chatzi, Ander Artola Velasco, Suhas Thejaswi, Manuel Gomez Rodriguez
  • Causal In-Context Learning in Transformers: Training Dynamics Across Heterogeneous Interventional Data
    Shanyun Gao, Murat Kocaoglu, Qifan Song
  • Towards Understanding When Causal Structure Improves Robustness: Evidence from Generative Models
    Manal Benhamza, Marianne Clausel, Myriam Tami
  • Neural Effect Modifier Search
    Riccardo Cadei, Falco J. Bargagli-Stoffi, Francesco Locatello
  • Evaluating Counterfactual Data Augmentation in Reinforcement Learning
    Shilpa Noushad, Sajan Kumar, Pratyush Uppuluri
  • Same Meaning, Different Tokens: Tokenization-Induced Shifts in Representations and Predictions
    Anthony Ragazzi, Eugene Santos
  • Causal Inference with Time Series Foundation Models
    Cyrus Illick, Saeyoung Rho, Vishal Misra
  • Provable Robustness to Spurious Correlations via Invariant Data For Robust Finetuning
    Ruqi Bai, Yao Ji, Mingyu Kim, Easton Currie, Zeyu Zhou, David I. Inouye
  • An Empirical Evaluation of Model Completion for Causal Inference
    Jiapeng Zhao, Elias Bareinboim, Rina Dechter
  • Causal Discovery Beyond Scaling: Mixed-Type DAG Learning with Native Missing-Data Inference
    Amine M'Charrak, Tommaso Salvatori, Thomas Lukasiewicz
  • Feature-Conditioned Causal Temporal Representation Learning for Human Motion-Inspired Dynamics
    Linghao Zeng, Alina Glushkova, Sotiris Manitsaris
  • CauScale: Neural Causal Discovery at Scale
    Bo Peng, Sirui Chen, Jiaguo Tian, Yu Qiao, Chaochao Lu
  • Causal Reasoning in Pieces: Modular In-Context Learning for Causal Discovery
    Kacper Kadziolka, Saber Salehkaleybar
  • Intervention-Based Stability as a Reliability Signal in Federated Graph Learning
    Yashmi Kumarasiri
  • On the identifiability of causal graphs with multiple environments
    Francesco Montagna
  • Demystifying amortized causal discovery with transformers
    Francesco Montagna, Max Cairney-Leeming, Dhanya Sridhar, Francesco Locatello
  • Masking Unfairness: Hiding Causality within Zero ATE
    Zou Yang, Sophia Xiao, Bijan Mazaheri
  • Scalable Neural Synthetic Control with Individual Counterfactuals under Hidden Confounding
    Maha Ouali, Badih Ghattas, Emmanuel Flachaire, Philippe Charpentier, Bozzi Laurent

Call for papers

We invite submissions exploring the synergy between scaling predictive methods and causal modeling to build the next generation of trustworthy and reliable AI.

Topics

Potential topics include, but are not limited to:

  • Emergence of causal abilities in foundation models (or the failure thereof)
  • OOD generalization and robustness of large models
  • Scaling causal generative modeling and representation learning
  • Causal, counterfactual, and logical reasoning in large models
  • Design of interactive causal world models
  • Trustworthy and interpretable AI
  • Causal discovery and abstraction (especially applied to AI)
  • Evaluation and benchmarking (and the limitations thereof)

Submission

We invite submissions of short papers presenting recent work on scaling and causality. Submissions are now being accepted through OpenReview.

Submissions should be formatted using the AISTATS LaTeX style. Papers are limited to 4 pages (excluding references and appendices). Accepted contributions will be presented as posters during the workshop. We will select a small number of contributed talks from the accepted submissions for short oral presentations at the workshop.

Submissions under review or accepted within the past year at other venues are allowed. All accepted papers are non-archival and will be made publicly available on OpenReview. Authors should create an OpenReview Profile at least two weeks in advance of the paper submission deadline.

Important dates

  • Submission deadline: February 27, 2026 March 3, 2026 (extended!) (Anywhere on Earth)
  • Notification of acceptance: March 18, 2026 (Anywhere on Earth)
  • Workshop date: May 5, 2026

Organizers

David Inouye

David Inouye

Purdue University
Bryon Aragam

Bryon Aragam

University of Chicago
Murat Kocaoglu

Murat Kocaoglu

Johns Hopkins University