Site icon R-bloggers

CIS Primer Question 3.2.1

[This article was first published on Brian Callander, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

CIS Primer Question 3.2.1

Posted on 10 February, 2019 by Brian Tags: CISP chapter 3, solutions, ace, risk difference, simpson's reversal Category: causal_inference_in_statistics_primer

Here are my solutions to question 3.2.1 of Causal Inference in Statistics: a Primer (CISP).

< !--more-->

Here are the parameters we’ll use. Note that they are taken from the Simpson’s revesal example of question 1.5.2.

r   <- 0.28   # fraction with syndrome

q0 <- 0.07  # P(X = 1 | Z = 0)
q1 <- 0.85  # P(X = 1 | Z = 1)
             
p00 <- 0.84 # P(Y = 1 | X = 0, Z = 0)
p10 <- 0.88 # P(Y = 1 | X = 1, Z = 0)
p01 <- 0.53 # P(Y = 1 | X = 0, Z = 1)
p11 <- 0.58 # P(Y = 1 | X = 1, Z = 1)

Part a

We can simulate the intervention by generating values for \(X\) independently of \(Z\).

N <- 10000  # number of individuals

set.seed(53201)

part_a <- tibble(z = rbinom(N, 1, r)) %>% 
  mutate(
    x = rbinom(n(), 1, 0.5), # no Z-dependence
    p_y_given_x_z = case_when(
      x == 0 & z == 0 ~ p00,
      x == 0 & z == 1 ~ p01,
      x == 1 & z == 0 ~ p10,
      x == 1 & z == 1 ~ p11
    ),
    y = rbinom(n(), 1, p_y_given_x_z)
  ) %>% 
  group_by(x, y) %>% 
  summarise(n = n()) %>% 
  group_by(x) %>% 
  mutate(p_y_given_do_x = n / sum(n))
x y n p_y_given_do_x
0 0 1238 0.2470565
0 1 3773 0.7529435
1 0 964 0.1932251
1 1 4025 0.8067749

Part b

To simulate observational data, we need to include the dependence of \(X\) on \(Z\).

N <- 100000  # number of individuals

set.seed(95400)

p_x_y_z <- tibble(
    id = 1:N,
    
    z = rbinom(N, 1, r),
    x = rbinom(N, 1, if_else(z == 0, q0, q1)),
    
    p_y_given_x_z = case_when(
      x == 0 & z == 0 ~ p00,
      x == 0 & z == 1 ~ p01,
      x == 1 & z == 0 ~ p10,
      x == 1 & z == 1 ~ p11
    ),
    
    y = rbinom(N, 1, p_y_given_x_z)
  ) %>% 
  group_by(x, y, z) %>% 
  count() %>% 
  ungroup() %>% 
  mutate(p = n / sum(n))
x y z n p
0 0 0 10723 0.10723
0 0 1 2020 0.02020
0 1 0 56015 0.56015
0 1 1 2205 0.02205
1 0 0 624 0.00624
1 0 1 10067 0.10067
1 1 0 4470 0.04470
1 1 1 13876 0.13876

In order to apply the causal effect rule, we’ll need \(\mathbb P(x \mid z)\).

p_x_given_z <- p_x_y_z %>% 
  group_by(x, z) %>% 
  summarise(n = sum(n)) %>% 
  group_by(z) %>% 
  mutate(p = n / sum(n)) %>% 
  ungroup()
x z n p
0 0 66738 0.9290845
0 1 4225 0.1499929
1 0 5094 0.0709155
1 1 23943 0.8500071

We can then add the conditional probabilities to the joint distribution table, then sum overal all the \(Z\) variables.

p_y_given_do_x <- p_x_y_z %>% 
  inner_join(
    p_x_given_z, 
    by = c('x', 'z'), 
    suffix = c('_num', '_denom')
  ) %>% 
  mutate(p = p_num / p_denom) %>% 
  group_by(x, y) %>%
  summarise(p = sum(p)) 
x y p
0 0 0.2500877
0 1 0.7499123
1 0 0.2064264
1 1 0.7935736

The causal effect estimates are very close to the simulated intervention.

Part c

We can calculate ACE simply by taking the difference of the causal effect estimates.

ace <- p_y_given_do_x %>% 
  spread(x, p) %>% 
  filter(y == 1) %>% 
  mutate(ace = `1` - `0`) %>% 
  pull(ace)

ace
[1] 0.04366134

This is different from the overall probability differences.

p_y_given_x <- p_x_y_z %>% 
  group_by(x, y) %>% 
  summarise(n = sum(n)) %>% 
  group_by(x) %>% 
  mutate(p = n / sum(n)) %>% 
  select(-n) 

risk_difference <- p_y_given_x %>% 
  spread(x, p) %>% 
  filter(y == 1) %>% 
  mutate(rd = `1` - `0`) %>% 
  pull(rd)

risk_difference
[1] -0.188613

Making \(X\) independent of \(Z\) would minimise the disrepancy between ACE and RD, which would turn the adjustment formula into the formulat for \(\mathbb P(y \mid x\). In other words, setting \(q_0 = q_1 = \mathbb P(X = 1)\) would do the trick.

Part d

Note that the desegregated causal effects

are both consisent with our calculation for the overall causal effect, ACE = 4.37%. The generated data are an illustration of Simpson’s reversal because the risk difference, -18.9%, has the opposite sign.

Please enable JavaScript to view the comments powered by Disqus.

To leave a comment for the author, please follow the link and comment on their blog: Brian Callander.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.