Tutorial on Causal Inference and Counterfactual Reasoning
Amit Sharma (@amt_shrma), Emre Kiciman (@emrek)
ACM KDD 2018 International Conference on Knowledge Discovery and Data Mining, London, UK
As computing systems are more frequently and more actively intervening to improve people's work and daily lives, it is critical to correctly predict and understand the causal effects of these interventions. Conventional machine learning methods, built on pattern recognition and correlational analyses, are insufficient for causal analysis. This tutorial will introduce participants to concepts in causal inference and counterfactual reasoning, drawing from a broad literature on the topic from statistics, social sciences and machine learning.
We first motivate the use of causal inference through examples in domains such as recommender systems, social media datasets, health, education and governance. To tackle such questions, we will introduce the key ingredient that causal analysis depends on---counterfactual reasoning---and describe the two most popular frameworks based on Bayesian graphical models and potential outcomes. Based on this, we will cover a range of methods suitable for doing causal inference with large-scale online data, including randomized experiments, observational methods like matching and stratification, and natural experiment-based methods such as instrumental variables and regression discontinuity. We will also focus on best practices for evaluation and validation of causal inference techniques, drawing from our own experiences.
We show application of these techniques through Jupyter notebooks, demonstrating how core concepts translate to empirical work. Throughout, we emphasise considerations of working with large-scale data from online systems, such as logs of user interactions or social data. The goal of this tutorial is to help you understand the basics of causal inference, be able to appropriately apply the most common causal inference methods, and be able to recognize situations where more complex methods are required.
- Introduction: Patterns and predictions are not enough
- Methods: Conditioning-based methods and natural experiments
- Considerations: Special considerations with large-scale and network data
- Broader Landscape: Heterogeneous treatment effects, machine learning and causal discovery
- References: Further reading