CIS Primer Question 3.4.1

[This article was first published on Brian Callander, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

CIS Primer Question 3.4.1

Here are my solutions to question 3.4.1 of Causal Inference in Statistics: a Primer (CISP).

If we can only measure one additional variable to estimate the causal effect of X on Y in figure 3.8, then we should measure W. From question 3.3.1 we see that no single variable satisfies the backdoor criteria. Moreover, visual inspection of the graph verifies that W satisfies the frontdoor criteria:

  1. it intercepts all (the only) directed paths from X to Y;
  2. there is no unblocked path from X to W; and
  3. all backdoor paths from W to Y are blocked by X.

To illustrate this, lets simulate the causal effect in 3 separate ways:

  1. by intervention,
  2. via the backdoor, and
  3. via the frontdoor.

Here are the data. Note that we have created functions for W and Y for use later.

N <- 100000

W <- function(x) {
  N <- length(x)
  rbinom(N, 1, inv_logit(-x))
}

Y <- function(d, w, z) {
  N <- length(d)
  rbinom(N, 1, inv_logit(-d - w + 3*z))
}

df <- tibble(id = 1:N) %>% 
  mutate(
    b = rnorm(N, 0, 1),
    a = b + rnorm(N, 0, 0.1),
    c = rnorm(N, 0, 1),
    d = rbinom(N, 1, inv_logit(-1 + c)),
    z = rbinom(N, 1, inv_logit(-2 + 2*b + c)),
    x = rbinom(N, 1, inv_logit(a + z)),
    w = W(x),
    y = Y(d, w, z)
  )
Simulated data for figure 3.8
id b a c d z x w y
1 0.3641297 0.3917626 1.0369530 1 1 0 0 1
2 0.0287563 0.0397299 0.5736271 0 0 1 1 0
3 -0.7727052 -0.5993870 -0.5179657 0 1 0 1 1
4 0.4107888 0.5737898 1.2586840 0 1 1 0 1
5 2.3512417 2.1631719 0.6746523 1 1 1 0 1

Intervention

In order to simulate an intervention, we assign values to X randomly, then assign new values for all its descendents. After intervention, the causal effect of X on Y is simply P(YX).

intervention <- df %>% 
  # intervene on x
  mutate(
    x = rbinom(n(), 1, 0.5),
    w = W(x),
    y = Y(d, w, z)
  ) %>% 
  # model P(y | do(x))
  glm(
    formula = y ~ x, 
    family = binomial(), 
    data = .
  ) %>% 
  # predict
  augment(
    newdata = tibble(x = 0:1), 
    type.predict = 'response'
  ) 
P(Y | do(X))
x .fitted .se.fit
0 0.4637566 0.0022239
1 0.5072714 0.0022422

We can compare this causal effect to the simple statistical effect to see the difference.

noncausal <- df %>% 
  # model P(y | x)
  glm(
    formula = y ~ x, 
    family = binomial(), 
    data = .
  ) %>% 
  # predict
  augment(
    newdata = tibble(x = 0:1), 
    type.predict = 'response'
  ) 
P(Y | X) ≠ P(Y | do(X))
x .fitted .se.fit
0 0.3797282 0.0022595
1 0.5759927 0.0021293

Backdoor

Since {X,Z} satisfies the backdoor criteria, we can use it to apply the backdoor adjustment. First we’ll need P(D,Z).

# P(d, z)
p_d_z <- df %>% 
  group_by(d, z) %>% 
  count() %>% 
  ungroup() %>%  
  mutate(p_d_z = n / sum(n)) 

Now we model P(YX,D,Z), multiply it by P(D,Z), then take the sum for each value of X.

backdoor <- formula(y ~ 1 + x + z + d) %>% 
  # model P(y | x, d, z)
  glm(
    family = binomial(),
    data = df
  ) %>%  
  # predict
  augment(
    type.predict = 'response',
    newdata = 
      crossing(
        d = c(0, 1),
        x = c(0, 1),
        z = c(0, 1)
      )
  ) %>% 
  # get P(d, z)
  mutate(p_y_given_d_x_z = .fitted) %>% 
  inner_join(p_d_z, by = c('d', 'z')) %>% 
  # backdoor adjustment over d, z
  group_by(x) %>% 
  summarise(p_y_given_do_x = sum(p_y_given_d_x_z * p_d_z))
Backdoor estimates for P(Y | do(X))
x p_y_given_do_x
0 0.4681530
1 0.5033398

Note that the backdoor adjusted estimates are similar to the estimates from intervention.

Frontdoor

To apply the frontdoor adjustment with W, we’ll need P(WX), P(X), and P(YX,W).

p_w_given_x <- df %>% 
  group_by(x, w) %>% 
  count() %>% 
  group_by(x) %>% 
  mutate(p_w_given_x = n / sum(n)) %>% 
  ungroup()

p_xprime <- df %>% 
  group_by(xprime = x) %>% 
  count() %>% 
  ungroup() %>% 
  mutate(p_xprime = n / sum(n))

p_y_given_xprime_w <- formula(y ~ 1 + x + w) %>% 
  glm(
    family = binomial(),
    data = df
  ) %>% 
  augment(
    newdata = crossing(x = 0:1, w = 0:1),
    type.predict = 'response'
  ) %>% 
  transmute(
    xprime = x,
    w,
    p_y_given_xprime_w = .fitted
  )

Now we apply the frontdoor adjustment:

P(Ydo(X))=x,wP(x)P(wx)P(yx,w).

frontdoor <- p_w_given_x %>% 
  inner_join(p_y_given_xprime_w, by = 'w') %>% 
  inner_join(p_xprime, by = 'xprime') %>% 
  group_by(x) %>%
  summarise(sum(p_w_given_x * p_y_given_xprime_w * p_xprime))
Frontdoor estimates of P(Y | do(X))
x sum(p_w_given_x * p_y_given_xprime_w * p_xprime)
0 0.4623710
1 0.5041105

Our frontdoor estimates of P(Ydo(X)) are very similar to the intervention and backdoor estimates.

To leave a comment for the author, please follow the link and comment on their blog: Brian Callander.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)