CIS Primer Question 3.4.1
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
CIS Primer Question 3.4.1
Here are my solutions to question 3.4.1 of Causal Inference in Statistics: a Primer (CISP).
If we can only measure one additional variable to estimate the causal effect of X on Y in figure 3.8, then we should measure W. From question 3.3.1 we see that no single variable satisfies the backdoor criteria. Moreover, visual inspection of the graph verifies that W satisfies the frontdoor criteria:
- it intercepts all (the only) directed paths from X to Y;
- there is no unblocked path from X to W; and
- all backdoor paths from W to Y are blocked by X.
To illustrate this, lets simulate the causal effect in 3 separate ways:
- by intervention,
- via the backdoor, and
- via the frontdoor.
Here are the data. Note that we have created functions for W and Y for use later.
N <- 100000 W <- function(x) { N <- length(x) rbinom(N, 1, inv_logit(-x)) } Y <- function(d, w, z) { N <- length(d) rbinom(N, 1, inv_logit(-d - w + 3*z)) } df <- tibble(id = 1:N) %>% mutate( b = rnorm(N, 0, 1), a = b + rnorm(N, 0, 0.1), c = rnorm(N, 0, 1), d = rbinom(N, 1, inv_logit(-1 + c)), z = rbinom(N, 1, inv_logit(-2 + 2*b + c)), x = rbinom(N, 1, inv_logit(a + z)), w = W(x), y = Y(d, w, z) )
id | b | a | c | d | z | x | w | y |
---|---|---|---|---|---|---|---|---|
1 | 0.3641297 | 0.3917626 | 1.0369530 | 1 | 1 | 0 | 0 | 1 |
2 | 0.0287563 | 0.0397299 | 0.5736271 | 0 | 0 | 1 | 1 | 0 |
3 | -0.7727052 | -0.5993870 | -0.5179657 | 0 | 1 | 0 | 1 | 1 |
4 | 0.4107888 | 0.5737898 | 1.2586840 | 0 | 1 | 1 | 0 | 1 |
5 | 2.3512417 | 2.1631719 | 0.6746523 | 1 | 1 | 1 | 0 | 1 |
Intervention
In order to simulate an intervention, we assign values to X randomly, then assign new values for all its descendents. After intervention, the causal effect of X on Y is simply P(Y∣X).
intervention <- df %>% # intervene on x mutate( x = rbinom(n(), 1, 0.5), w = W(x), y = Y(d, w, z) ) %>% # model P(y | do(x)) glm( formula = y ~ x, family = binomial(), data = . ) %>% # predict augment( newdata = tibble(x = 0:1), type.predict = 'response' )
x | .fitted | .se.fit |
---|---|---|
0 | 0.4637566 | 0.0022239 |
1 | 0.5072714 | 0.0022422 |
We can compare this causal effect to the simple statistical effect to see the difference.
noncausal <- df %>% # model P(y | x) glm( formula = y ~ x, family = binomial(), data = . ) %>% # predict augment( newdata = tibble(x = 0:1), type.predict = 'response' )
x | .fitted | .se.fit |
---|---|---|
0 | 0.3797282 | 0.0022595 |
1 | 0.5759927 | 0.0021293 |
Backdoor
Since {X,Z} satisfies the backdoor criteria, we can use it to apply the backdoor adjustment. First we’ll need P(D,Z).
# P(d, z) p_d_z <- df %>% group_by(d, z) %>% count() %>% ungroup() %>% mutate(p_d_z = n / sum(n))
Now we model P(Y∣X,D,Z), multiply it by P(D,Z), then take the sum for each value of X.
backdoor <- formula(y ~ 1 + x + z + d) %>% # model P(y | x, d, z) glm( family = binomial(), data = df ) %>% # predict augment( type.predict = 'response', newdata = crossing( d = c(0, 1), x = c(0, 1), z = c(0, 1) ) ) %>% # get P(d, z) mutate(p_y_given_d_x_z = .fitted) %>% inner_join(p_d_z, by = c('d', 'z')) %>% # backdoor adjustment over d, z group_by(x) %>% summarise(p_y_given_do_x = sum(p_y_given_d_x_z * p_d_z))
x | p_y_given_do_x |
---|---|
0 | 0.4681530 |
1 | 0.5033398 |
Note that the backdoor adjusted estimates are similar to the estimates from intervention.
Frontdoor
To apply the frontdoor adjustment with W, we’ll need P(W∣X), P(X′), and P(Y∣X,W).
p_w_given_x <- df %>% group_by(x, w) %>% count() %>% group_by(x) %>% mutate(p_w_given_x = n / sum(n)) %>% ungroup() p_xprime <- df %>% group_by(xprime = x) %>% count() %>% ungroup() %>% mutate(p_xprime = n / sum(n)) p_y_given_xprime_w <- formula(y ~ 1 + x + w) %>% glm( family = binomial(), data = df ) %>% augment( newdata = crossing(x = 0:1, w = 0:1), type.predict = 'response' ) %>% transmute( xprime = x, w, p_y_given_xprime_w = .fitted )
Now we apply the frontdoor adjustment:
P(Y∣do(X))=∑x′,wP(x′)⋅P(w∣x)⋅P(y∣x′,w).
frontdoor <- p_w_given_x %>% inner_join(p_y_given_xprime_w, by = 'w') %>% inner_join(p_xprime, by = 'xprime') %>% group_by(x) %>% summarise(sum(p_w_given_x * p_y_given_xprime_w * p_xprime))
x | sum(p_w_given_x * p_y_given_xprime_w * p_xprime) |
---|---|
0 | 0.4623710 |
1 | 0.5041105 |
Our frontdoor estimates of P(Y∣do(X)) are very similar to the intervention and backdoor estimates.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.