CIS Primer Question 3.2.1
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
CIS Primer Question 3.2.1
Here are my solutions to question 3.2.1 of Causal Inference in Statistics: a Primer (CISP).
Here are the parameters we’ll use. Note that they are taken from the Simpson’s revesal example of question 1.5.2.
r <- 0.28 # fraction with syndrome q0 <- 0.07 # P(X = 1 | Z = 0) q1 <- 0.85 # P(X = 1 | Z = 1) p00 <- 0.84 # P(Y = 1 | X = 0, Z = 0) p10 <- 0.88 # P(Y = 1 | X = 1, Z = 0) p01 <- 0.53 # P(Y = 1 | X = 0, Z = 1) p11 <- 0.58 # P(Y = 1 | X = 1, Z = 1)
Part a
We can simulate the intervention by generating values for X independently of Z.
N <- 10000 # number of individuals set.seed(53201) part_a <- tibble(z = rbinom(N, 1, r)) %>% mutate( x = rbinom(n(), 1, 0.5), # no Z-dependence p_y_given_x_z = case_when( x == 0 & z == 0 ~ p00, x == 0 & z == 1 ~ p01, x == 1 & z == 0 ~ p10, x == 1 & z == 1 ~ p11 ), y = rbinom(n(), 1, p_y_given_x_z) ) %>% group_by(x, y) %>% summarise(n = n()) %>% group_by(x) %>% mutate(p_y_given_do_x = n / sum(n))
x | y | n | p_y_given_do_x |
---|---|---|---|
0 | 0 | 1238 | 0.2470565 |
0 | 1 | 3773 | 0.7529435 |
1 | 0 | 964 | 0.1932251 |
1 | 1 | 4025 | 0.8067749 |
Part b
To simulate observational data, we need to include the dependence of X on Z.
N <- 100000 # number of individuals set.seed(95400) p_x_y_z <- tibble( id = 1:N, z = rbinom(N, 1, r), x = rbinom(N, 1, if_else(z == 0, q0, q1)), p_y_given_x_z = case_when( x == 0 & z == 0 ~ p00, x == 0 & z == 1 ~ p01, x == 1 & z == 0 ~ p10, x == 1 & z == 1 ~ p11 ), y = rbinom(N, 1, p_y_given_x_z) ) %>% group_by(x, y, z) %>% count() %>% ungroup() %>% mutate(p = n / sum(n))
x | y | z | n | p |
---|---|---|---|---|
0 | 0 | 0 | 10723 | 0.10723 |
0 | 0 | 1 | 2020 | 0.02020 |
0 | 1 | 0 | 56015 | 0.56015 |
0 | 1 | 1 | 2205 | 0.02205 |
1 | 0 | 0 | 624 | 0.00624 |
1 | 0 | 1 | 10067 | 0.10067 |
1 | 1 | 0 | 4470 | 0.04470 |
1 | 1 | 1 | 13876 | 0.13876 |
In order to apply the causal effect rule, we’ll need P(x∣z).
p_x_given_z <- p_x_y_z %>% group_by(x, z) %>% summarise(n = sum(n)) %>% group_by(z) %>% mutate(p = n / sum(n)) %>% ungroup()
x | z | n | p |
---|---|---|---|
0 | 0 | 66738 | 0.9290845 |
0 | 1 | 4225 | 0.1499929 |
1 | 0 | 5094 | 0.0709155 |
1 | 1 | 23943 | 0.8500071 |
We can then add the conditional probabilities to the joint distribution table, then sum overal all the Z variables.
p_y_given_do_x <- p_x_y_z %>% inner_join( p_x_given_z, by = c('x', 'z'), suffix = c('_num', '_denom') ) %>% mutate(p = p_num / p_denom) %>% group_by(x, y) %>% summarise(p = sum(p))
x | y | p |
---|---|---|
0 | 0 | 0.2500877 |
0 | 1 | 0.7499123 |
1 | 0 | 0.2064264 |
1 | 1 | 0.7935736 |
The causal effect estimates are very close to the simulated intervention.
Part c
We can calculate ACE simply by taking the difference of the causal effect estimates.
ace <- p_y_given_do_x %>% spread(x, p) %>% filter(y == 1) %>% mutate(ace = `1` - `0`) %>% pull(ace) ace [1] 0.04366134
This is different from the overall probability differences.
p_y_given_x <- p_x_y_z %>% group_by(x, y) %>% summarise(n = sum(n)) %>% group_by(x) %>% mutate(p = n / sum(n)) %>% select(-n) risk_difference <- p_y_given_x %>% spread(x, p) %>% filter(y == 1) %>% mutate(rd = `1` - `0`) %>% pull(rd) risk_difference [1] -0.188613
Making X independent of Z would minimise the disrepancy between ACE and RD, which would turn the adjustment formula into the formulat for P(y∣x. In other words, setting q0=q1=P(X=1) would do the trick.
Part d
Note that the desegregated causal effects
- p1,0−p0,0 is 0.04; and
- p1,1−p0,1 is 0.05,
are both consisent with our calculation for the overall causal effect, ACE = 4.37%. The generated data are an illustration of Simpson’s reversal because the risk difference, -18.9%, has the opposite sign.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.