- Methods for causal inference
Methods for causal inference
In this part, we focus on basic methods for causal inference, with integrated learning about assumptions and validation tests. Methods will be demonstrated using a Jupyter python notebook and examples of causal problems in online social data.
Conditioning-based methods are the workhorse of causal inference when running active experiments is not feasible. We discuss these methods by showing how each one is, in its own way, attempting to approximate the gold standard randomized experiment. For each method, we will describe how it works, how to recognize when it can be applied, and its relative advantages and disadvantages.
Conditioning effect on confounders
Conditioning on key causal variables is the simplest method for isolating causal effect. We show how Simpson's Paradox can be resolved using conditioning.
Matching and stratification
Matching and stratification approximate conditioning in high-dimensions and continuous variable settings. The goal is to ensure an overall balance among the treated and the control groups, similar to what would be found in a randomized experiment. We show an example of matching in online social media with people's status timelines.
One common method for approximate matching and stratification is to consider the propensity of being treated by a particular cause and to balance this score that between treatment and control groups. We demonstrate the application of propensity scores to the same problem.
As a complementary approach, we consider simple regression, where we try to predict the outcome based on the all available covariates. We show that we need to be careful with the bias-variance tradeoff, as the goal is to estimate the importance of a feature and no ground-truth test set is available.
Doubly robust estimator
Doubly robust methods provide the best of conditioning and regression approaches by combining propensity-based and regression-based methods so that the causal estimate is accurate whenever one of the two models is correctly specified.
Synthetic control method
Finally, if none of the above methods suit, then can consider building synthetic controls. These are especially useful in settings where the treatment is applied to the whole population, such as in marketing or broadcast social updates. We provide an example that estimates the effect of an outreach campaign.
Natural experiments are the other main approach to estimating causal effects. Conditioning methods can fail if some important confounders are unobserved. Here the idea is to find an observed variable that acts like the randomized arm of an experiment. The challenge is usually finding such an observed variable.
Simple natural experiment
We introduce the standard natural experiment with the timeless example of cholera cause estimation in 1850s.
We then move to the more advanced instrumental variable method. This method ensures that we obtain the correct causal effect, even if there are unobserved confounders. We provide examples from online recommender systems and analysis of fake news.
Finally, another way to discover a natural experiment is look for discontinuities in observed data. This is called the regression discontinuity method.
Finally, we will describe sensitivity analysis, and how we can estimate the impact on the measured causal effects of changing the assumptions we make in the observational studies and natural experiments above. Using the Jupyter code from our above methods, we will present key techniques used to perform sensitivity analyses and how to interpret and report the results.