# DoWhy example on the Lalonde dataset¶

Thanks to [@mizuy](https://github.com/mizuy) for providing this example. Here we use the Lalonde dataset and apply IPW estimator to it.

In [1]:

import os, sys
sys.path.append(os.path.abspath("../../"))

import dowhy
from dowhy.do_why import CausalModel
from rpy2.robjects import r as R

#%R install.packages("Matching")
%R library(Matching)


/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: Loading required package: MASS

warnings.warn(x, RRuntimeWarning)
/usr/local/lib/python3.5/dist-packages/rpy2/rinterface/__init__.py:146: RRuntimeWarning: ##
##  Matching (Version 4.9-3, Build Date: 2018-05-03)
##  See http://sekhon.berkeley.edu/matching for additional documentation.
##   Jasjeet S. Sekhon. 2011. Multivariate and Propensity Score Matching
##   Software with Automated Balance Optimization: The Matching package for R.''
##   Journal of Statistical Software, 42(7): 1-52.
##

warnings.warn(x, RRuntimeWarning)

Out[1]:

array(['Matching', 'MASS', 'tools', 'stats', 'graphics', 'grDevices',
'utils', 'datasets', 'methods', 'base'],
dtype='<U9')


In [2]:

%R data(lalonde)
%R -o lalonde



## Run DoWhy analysis: model, identify, estimate¶

In [3]:


model=CausalModel(
data = lalonde,
treatment='treat',
outcome='re78',
common_causes='nodegr+black+hisp+age+educ+married'.split('+'))
identified_estimand = model.identify_effect()
estimate = model.estimate_effect(identified_estimand,
method_name="backdoor.propensity_score_weighting")
#print(estimate)
print("Causal Estimate is " + str(estimate.value))

WARNING:dowhy.do_why:Causal Graph not provided. DoWhy will construct a graph based on data inputs.
INFO:dowhy.causal_identifier:Common causes of treatment and outcome:{'black', 'educ', 'U', 'nodegr', 'age', 'married', 'hisp'}

Model to find the causal effect of treatment treat on outcome re78
{'observed': 'yes'}
{'observed': 'yes'}
{'observed': 'no', 'label': 'Unobserved Confounders'}
There are unobserved common causes. Causal effect cannot be identified.
WARN: Do you want to continue by ignoring these unobserved confounders? [y/n] y

INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:[]

PropensityScoreWeightingEstimator

INFO:dowhy.causal_estimator:INFO: Using Propensity Score Weighting Estimator
INFO:dowhy.causal_estimator:b: re78~treat+black+educ+nodegr+age+married+hisp

Causal Estimate is 1634.98683597


## Sanity check: compare to manual IPW estimate¶

In [4]:

df = model._data
ps = df['ps']
y = df['re78']
z = df['treat']

ey1 = z*y/ps / sum(z/ps)
ey0 = (1-z)*y/(1-ps) / sum((1-z)/(1-ps))
ate = ey1.sum()-ey0.sum()
print("Causal Estimate is " + str(ate))

# correct -> Causal Estimate is 1634.9868359746906

Causal Estimate is 1634.98683597