CIS Primer Question 3.2.1

[This article was first published on Brian Callander, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

CIS Primer Question 3.2.1

Here are my solutions to question 3.2.1 of Causal Inference in Statistics: a Primer (CISP).

Here are the parameters we’ll use. Note that they are taken from the Simpson’s revesal example of question 1.5.2.

r   <- 0.28   # fraction with syndrome

q0 <- 0.07  # P(X = 1 | Z = 0)
q1 <- 0.85  # P(X = 1 | Z = 1)
             
p00 <- 0.84 # P(Y = 1 | X = 0, Z = 0)
p10 <- 0.88 # P(Y = 1 | X = 1, Z = 0)
p01 <- 0.53 # P(Y = 1 | X = 0, Z = 1)
p11 <- 0.58 # P(Y = 1 | X = 1, Z = 1)

Part a

We can simulate the intervention by generating values for X independently of Z.

N <- 10000  # number of individuals

set.seed(53201)

part_a <- tibble(z = rbinom(N, 1, r)) %>% 
  mutate(
    x = rbinom(n(), 1, 0.5), # no Z-dependence
    p_y_given_x_z = case_when(
      x == 0 & z == 0 ~ p00,
      x == 0 & z == 1 ~ p01,
      x == 1 & z == 0 ~ p10,
      x == 1 & z == 1 ~ p11
    ),
    y = rbinom(n(), 1, p_y_given_x_z)
  ) %>% 
  group_by(x, y) %>% 
  summarise(n = n()) %>% 
  group_by(x) %>% 
  mutate(p_y_given_do_x = n / sum(n))
x y n p_y_given_do_x
0 0 1238 0.2470565
0 1 3773 0.7529435
1 0 964 0.1932251
1 1 4025 0.8067749

Part b

To simulate observational data, we need to include the dependence of X on Z.

N <- 100000  # number of individuals

set.seed(95400)

p_x_y_z <- tibble(
    id = 1:N,
    
    z = rbinom(N, 1, r),
    x = rbinom(N, 1, if_else(z == 0, q0, q1)),
    
    p_y_given_x_z = case_when(
      x == 0 & z == 0 ~ p00,
      x == 0 & z == 1 ~ p01,
      x == 1 & z == 0 ~ p10,
      x == 1 & z == 1 ~ p11
    ),
    
    y = rbinom(N, 1, p_y_given_x_z)
  ) %>% 
  group_by(x, y, z) %>% 
  count() %>% 
  ungroup() %>% 
  mutate(p = n / sum(n))
x y z n p
0 0 0 10723 0.10723
0 0 1 2020 0.02020
0 1 0 56015 0.56015
0 1 1 2205 0.02205
1 0 0 624 0.00624
1 0 1 10067 0.10067
1 1 0 4470 0.04470
1 1 1 13876 0.13876

In order to apply the causal effect rule, we’ll need P(xz).

p_x_given_z <- p_x_y_z %>% 
  group_by(x, z) %>% 
  summarise(n = sum(n)) %>% 
  group_by(z) %>% 
  mutate(p = n / sum(n)) %>% 
  ungroup()
x z n p
0 0 66738 0.9290845
0 1 4225 0.1499929
1 0 5094 0.0709155
1 1 23943 0.8500071

We can then add the conditional probabilities to the joint distribution table, then sum overal all the Z variables.

p_y_given_do_x <- p_x_y_z %>% 
  inner_join(
    p_x_given_z, 
    by = c('x', 'z'), 
    suffix = c('_num', '_denom')
  ) %>% 
  mutate(p = p_num / p_denom) %>% 
  group_by(x, y) %>%
  summarise(p = sum(p)) 
x y p
0 0 0.2500877
0 1 0.7499123
1 0 0.2064264
1 1 0.7935736

The causal effect estimates are very close to the simulated intervention.

Part c

We can calculate ACE simply by taking the difference of the causal effect estimates.

ace <- p_y_given_do_x %>% 
  spread(x, p) %>% 
  filter(y == 1) %>% 
  mutate(ace = `1` - `0`) %>% 
  pull(ace)

ace
[1] 0.04366134

This is different from the overall probability differences.

p_y_given_x <- p_x_y_z %>% 
  group_by(x, y) %>% 
  summarise(n = sum(n)) %>% 
  group_by(x) %>% 
  mutate(p = n / sum(n)) %>% 
  select(-n) 

risk_difference <- p_y_given_x %>% 
  spread(x, p) %>% 
  filter(y == 1) %>% 
  mutate(rd = `1` - `0`) %>% 
  pull(rd)

risk_difference
[1] -0.188613

Making X independent of Z would minimise the disrepancy between ACE and RD, which would turn the adjustment formula into the formulat for P(yx. In other words, setting q0=q1=P(X=1) would do the trick.

Part d

Note that the desegregated causal effects

  • p1,0p0,0 is 0.04; and
  • p1,1p0,1 is 0.05,

are both consisent with our calculation for the overall causal effect, ACE = 4.37%. The generated data are an illustration of Simpson’s reversal because the risk difference, -18.9%, has the opposite sign.

To leave a comment for the author, please follow the link and comment on their blog: Brian Callander.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)