CIS Primer Question 1.5.2
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
CIS Primer Question 1.5.2
Here are my solutions to question 1.5.2 of Causal Inference in Statistics: a Primer (CISP).
I’ll use different indexing to make the notation clearer. In particular, the indices will match the values of the conditioning variables.
Part a
The full joint probability is
P(x,y,z)=P(z)⋅P(x∣z)⋅P(y∣x,z)
using the decomposition formula. Each factor is given by
P(z)=zr+(1–z)(1–r)P(x∣z)=xqz+(1–x)(1–qz)P(y∣x,z)=ypx,z+(1–y)(1–px,z)
where each parameter is assumed to have support on {0,1}.
The marginal distributions are given by
P(x,z)=P(x∣z)⋅P(z)P(y,z)=P(0,y,z)+P(1,y,z)P(x,y)=P(x,y,0)+P(x,y,1)=ypx,0+(1–y)(1–px,0)+ypx,1+(1–y)(1–px,1)=y(px,0+px,1)+(1–y)(2–px,0–px,1).
Furthermore,
P(x)=∑zP(x∣z)P(z)=∑z(xqz+(1–x)(1–qz))(zr+(1–z)(1–r))
so that
P(X=0)=(1–q0)(1–r)+(1–q1)rP(X=1)=q0(1–r)+q1r
Part b
The increase in probability from taking the drug in each sub-population is:
- P(y=1∣x=1,z=0)–P(y=1∣x=0,z=0)=p1,0–p0,0; and
- P(y=1∣x=1,z=1)–P(y=1∣x=0,z=1)=p1,1–p0,1.
In the whole population, the increase is P(Y=1∣X=1)–P(Y=1∣X=0), calcualted via
1∑z=0P(Y=1,Z=z∣X=1)–P(Y=1,Z=z∣X=0)=1∑z=0P(X=1,Y=1,Z=z)P(X=1)–P(X=0,Y=1,Z=z)P(X=0)=(1–r)q0p1,0+rq1p1,1q0(1–r)+q1r–(1–r)(1–q0)p0,0+r(1–q1)p0,1(1–q0)(1–r)+(1–q1)r
Part c
There’s no need to be smart about this. Let’s just simulate lots of values and find some combination with a Simpson’s reversal. We’ll generate a dataset with a positive probability difference in each sub-population, then filter out anything that also has a non-negative population difference.
set.seed(8168) N <- 10000 part_c <- tibble( id = 1:N %>% as.integer(), r = rbeta(N, 2, 2), # P(Z = 1) q0 = rbeta(N, 2, 2), # P(X = 1 | Z = 0) q1 = rbeta(N, 2, 2), # P(X = 1 | Z = 1) p00 = rbeta(N, 2, 2), # P(Y = 1 | X = 0, Z = 0) p10 = rbeta(N, 2, 2) * (p00 - 1) + 1, # P(Y = 1 | X = 1, Z = 0) p01 = rbeta(N, 2, 2), # P(Y = 1 | X = 0, Z = 1) p11 = rbeta(N, 2, 2) * (p01 - 1) + 1, # P(Y = 1 | X = 1, Z = 1) diff_pop = (p10 * q0 * (1 - r) + p11 * q1 * r) / (q0 * (1 - r) + q1 * r) - (p00 * (1 - q0) * (1 - r) + p01 * (1 - q1) * r) / ((1 - q0) * (1 - r) + (1 - q1) * r), diff_z0 = p10 - p00, diff_z1 = p11 - p01 )
As a check, there should be no rows with a non-positive difference.
check <- part_c %>% filter(diff_z0 <= 0 | diff_z1 <= 0) %>% nrow() # throw error if there are rows stopifnot(check == 0) check [1] 0
Now we simply throw away any rows with a non-negative population difference. Here is one combination of parameters exhibiting Simpson’s reversal.
simpsons_reversal <- part_c %>% filter(diff_pop < -0.05) %>% head(1) %>% gather(term, value)
term | value |
---|---|
id | 109.0000000 |
r | 0.2837123 |
q0 | 0.0664811 |
q1 | 0.8468126 |
p00 | 0.8441892 |
p10 | 0.8827558 |
p01 | 0.5273831 |
p11 | 0.5816885 |
diff_pop | -0.1933634 |
diff_z0 | 0.0385666 |
diff_z1 | 0.0543054 |
As a final check, let’s generate a dataset for this set of parameters.
df <- simpsons_reversal %>% spread(term, value) %>% crossing(unit = 1:N) %>% mutate( z = rbinom(N, 1, r), x = rbinom(N, 1, if_else(z == 0, q0, q1)), p_y_given_x_z = case_when( x == 0 & z == 0 ~ p00, x == 0 & z == 1 ~ p01, x == 1 & z == 0 ~ p10, x == 1 & z == 1 ~ p11 ), y = rbinom(N, 1, p_y_given_x_z) ) %>% select(unit, x, y, z)
The empirical joint probability distribution is as follows.
p_x_y_z <- df %>% group_by(x, y, z) %>% count() %>% ungroup() %>% mutate(p = n / sum(n))
x | y | z | n | p |
---|---|---|---|---|
0 | 0 | 0 | 1068 | 0.1068 |
0 | 0 | 1 | 197 | 0.0197 |
0 | 1 | 0 | 5609 | 0.5609 |
0 | 1 | 1 | 224 | 0.0224 |
1 | 0 | 0 | 52 | 0.0052 |
1 | 0 | 1 | 1016 | 0.1016 |
1 | 1 | 0 | 400 | 0.0400 |
1 | 1 | 1 | 1434 | 0.1434 |
The population-level probability difference is given by:
diff_pop <- p_x_y_z %>% group_by(x) %>% summarise(p = sum(n * y) / sum(n)) %>% spread(x, p) %>% mutate(diff = `1` - `0`)
0 | 1 | diff |
---|---|---|
0.8217808 | 0.6319779 | -0.1898028 |
which is close to the theoretical value.
Similarly, the sub-population differences are
diff_z <- p_x_y_z %>% group_by(x, z) %>% summarise(p = sum(n * y) / sum(n)) %>% spread(x, p) %>% mutate(diff = `1` - `0`)
z | 0 | 1 | diff |
---|---|---|---|
0 | 0.8400479 | 0.8849558 | 0.0449078 |
1 | 0.5320665 | 0.5853061 | 0.0532396 |
which are also close to the theoretical values we calculated. More importantly, they have a different sign to the population difference, confiming that we have case of Simpson’s reversal.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.