CIS Primer Question 3.2.1
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
CIS Primer Question 3.2.1
Here are my solutions to question 3.2.1 of Causal Inference in Statistics: a Primer (CISP).
Here are the parameters we’ll use. Note that they are taken from the Simpson’s revesal example of question 1.5.2.
r <- 0.28 # fraction with syndrome q0 <- 0.07 # P(X = 1 | Z = 0) q1 <- 0.85 # P(X = 1 | Z = 1) p00 <- 0.84 # P(Y = 1 | X = 0, Z = 0) p10 <- 0.88 # P(Y = 1 | X = 1, Z = 0) p01 <- 0.53 # P(Y = 1 | X = 0, Z = 1) p11 <- 0.58 # P(Y = 1 | X = 1, Z = 1)
Part a
We can simulate the intervention by generating values for \(X\) independently of \(Z\).
N <- 10000 # number of individuals set.seed(53201) part_a <- tibble(z = rbinom(N, 1, r)) %>% mutate( x = rbinom(n(), 1, 0.5), # no Z-dependence p_y_given_x_z = case_when( x == 0 & z == 0 ~ p00, x == 0 & z == 1 ~ p01, x == 1 & z == 0 ~ p10, x == 1 & z == 1 ~ p11 ), y = rbinom(n(), 1, p_y_given_x_z) ) %>% group_by(x, y) %>% summarise(n = n()) %>% group_by(x) %>% mutate(p_y_given_do_x = n / sum(n))
x | y | n | p_y_given_do_x |
---|---|---|---|
0 | 0 | 1238 | 0.2470565 |
0 | 1 | 3773 | 0.7529435 |
1 | 0 | 964 | 0.1932251 |
1 | 1 | 4025 | 0.8067749 |
Part b
To simulate observational data, we need to include the dependence of \(X\) on \(Z\).
N <- 100000 # number of individuals set.seed(95400) p_x_y_z <- tibble( id = 1:N, z = rbinom(N, 1, r), x = rbinom(N, 1, if_else(z == 0, q0, q1)), p_y_given_x_z = case_when( x == 0 & z == 0 ~ p00, x == 0 & z == 1 ~ p01, x == 1 & z == 0 ~ p10, x == 1 & z == 1 ~ p11 ), y = rbinom(N, 1, p_y_given_x_z) ) %>% group_by(x, y, z) %>% count() %>% ungroup() %>% mutate(p = n / sum(n))
x | y | z | n | p |
---|---|---|---|---|
0 | 0 | 0 | 10723 | 0.10723 |
0 | 0 | 1 | 2020 | 0.02020 |
0 | 1 | 0 | 56015 | 0.56015 |
0 | 1 | 1 | 2205 | 0.02205 |
1 | 0 | 0 | 624 | 0.00624 |
1 | 0 | 1 | 10067 | 0.10067 |
1 | 1 | 0 | 4470 | 0.04470 |
1 | 1 | 1 | 13876 | 0.13876 |
In order to apply the causal effect rule, we’ll need \(\mathbb P(x \mid z)\).
p_x_given_z <- p_x_y_z %>% group_by(x, z) %>% summarise(n = sum(n)) %>% group_by(z) %>% mutate(p = n / sum(n)) %>% ungroup()
x | z | n | p |
---|---|---|---|
0 | 0 | 66738 | 0.9290845 |
0 | 1 | 4225 | 0.1499929 |
1 | 0 | 5094 | 0.0709155 |
1 | 1 | 23943 | 0.8500071 |
We can then add the conditional probabilities to the joint distribution table, then sum overal all the \(Z\) variables.
p_y_given_do_x <- p_x_y_z %>% inner_join( p_x_given_z, by = c('x', 'z'), suffix = c('_num', '_denom') ) %>% mutate(p = p_num / p_denom) %>% group_by(x, y) %>% summarise(p = sum(p))
x | y | p |
---|---|---|
0 | 0 | 0.2500877 |
0 | 1 | 0.7499123 |
1 | 0 | 0.2064264 |
1 | 1 | 0.7935736 |
The causal effect estimates are very close to the simulated intervention.
Part c
We can calculate ACE simply by taking the difference of the causal effect estimates.
ace <- p_y_given_do_x %>% spread(x, p) %>% filter(y == 1) %>% mutate(ace = `1` - `0`) %>% pull(ace) ace [1] 0.04366134
This is different from the overall probability differences.
p_y_given_x <- p_x_y_z %>% group_by(x, y) %>% summarise(n = sum(n)) %>% group_by(x) %>% mutate(p = n / sum(n)) %>% select(-n) risk_difference <- p_y_given_x %>% spread(x, p) %>% filter(y == 1) %>% mutate(rd = `1` - `0`) %>% pull(rd) risk_difference [1] -0.188613
Making \(X\) independent of \(Z\) would minimise the disrepancy between ACE and RD, which would turn the adjustment formula into the formulat for \(\mathbb P(y \mid x\). In other words, setting \(q_0 = q_1 = \mathbb P(X = 1)\) would do the trick.
Part d
Note that the desegregated causal effects
- \(p_{1, 0} - p_{0, 0}\) is 0.04; and
- \(p_{1, 1} - p_{0, 1}\) is 0.05,
are both consisent with our calculation for the overall causal effect, ACE = 4.37%. The generated data are an illustration of Simpson’s reversal because the risk difference, -18.9%, has the opposite sign.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.