CIS Primer Question 1.5.2

[This article was first published on Brian Callander, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

CIS Primer Question 1.5.2

Here are my solutions to question 1.5.2 of Causal Inference in Statistics: a Primer (CISP).

I’ll use different indexing to make the notation clearer. In particular, the indices will match the values of the conditioning variables.

Part a

The full joint probability is

P(x,y,z)=P(z)P(xz)P(yx,z)

using the decomposition formula. Each factor is given by

P(z)=zr+(1z)(1r)P(xz)=xqz+(1x)(1qz)P(yx,z)=ypx,z+(1y)(1px,z)

where each parameter is assumed to have support on {0,1}.

The marginal distributions are given by

P(x,z)=P(xz)P(z)P(y,z)=P(0,y,z)+P(1,y,z)P(x,y)=P(x,y,0)+P(x,y,1)=ypx,0+(1y)(1px,0)+ypx,1+(1y)(1px,1)=y(px,0+px,1)+(1y)(2px,0px,1).

Furthermore,

P(x)=zP(xz)P(z)=z(xqz+(1x)(1qz))(zr+(1z)(1r))

so that

P(X=0)=(1q0)(1r)+(1q1)rP(X=1)=q0(1r)+q1r

Part b

The increase in probability from taking the drug in each sub-population is:

  • P(y=1x=1,z=0)P(y=1x=0,z=0)=p1,0p0,0; and
  • P(y=1x=1,z=1)P(y=1x=0,z=1)=p1,1p0,1.

In the whole population, the increase is P(Y=1X=1)P(Y=1X=0), calcualted via

1z=0P(Y=1,Z=zX=1)P(Y=1,Z=zX=0)=1z=0P(X=1,Y=1,Z=z)P(X=1)P(X=0,Y=1,Z=z)P(X=0)=(1r)q0p1,0+rq1p1,1q0(1r)+q1r(1r)(1q0)p0,0+r(1q1)p0,1(1q0)(1r)+(1q1)r

Part c

There’s no need to be smart about this. Let’s just simulate lots of values and find some combination with a Simpson’s reversal. We’ll generate a dataset with a positive probability difference in each sub-population, then filter out anything that also has a non-negative population difference.

set.seed(8168)

N <- 10000

part_c <- tibble(
  id = 1:N %>% as.integer(),
  
  r = rbeta(N, 2, 2),   # P(Z = 1)
  
  q0 = rbeta(N, 2, 2),  # P(X = 1 | Z = 0)
  q1 = rbeta(N, 2, 2),  # P(X = 1 | Z = 1)
  
  p00 = rbeta(N, 2, 2), # P(Y = 1 | X = 0, Z = 0)
  p10 = rbeta(N, 2, 2) * (p00 - 1) + 1, # P(Y = 1 | X = 1, Z = 0)
  p01 = rbeta(N, 2, 2), # P(Y = 1 | X = 0, Z = 1)
  p11 = rbeta(N, 2, 2) * (p01 - 1) + 1, # P(Y = 1 | X = 1, Z = 1)
  
  diff_pop = (p10 * q0 * (1 - r) + p11 * q1 * r) / (q0 * (1 - r) + q1 * r) - (p00 * (1 - q0) * (1 - r) + p01 * (1 - q1) * r) / ((1 - q0) * (1 - r) + (1 - q1) * r),
  diff_z0 = p10 - p00,
  diff_z1 = p11 - p01
) 

As a check, there should be no rows with a non-positive difference.

check <- part_c %>% 
  filter(diff_z0 <= 0 | diff_z1 <= 0) %>% 
  nrow()

# throw error if there are rows
stopifnot(check == 0)

check
[1] 0

Now we simply throw away any rows with a non-negative population difference. Here is one combination of parameters exhibiting Simpson’s reversal.

simpsons_reversal <- part_c %>% 
  filter(diff_pop < -0.05) %>% 
  head(1) %>% 
  gather(term, value)
Parameters leading to Simpson’s reversal
term value
id 109.0000000
r 0.2837123
q0 0.0664811
q1 0.8468126
p00 0.8441892
p10 0.8827558
p01 0.5273831
p11 0.5816885
diff_pop -0.1933634
diff_z0 0.0385666
diff_z1 0.0543054

As a final check, let’s generate a dataset for this set of parameters.

df <- simpsons_reversal %>% 
  spread(term, value) %>% 
  crossing(unit = 1:N) %>% 
  mutate(
    z = rbinom(N, 1, r),
    x = rbinom(N, 1, if_else(z == 0, q0, q1)),
    p_y_given_x_z = case_when(
      x == 0 & z == 0 ~ p00,
      x == 0 & z == 1 ~ p01,
      x == 1 & z == 0 ~ p10,
      x == 1 & z == 1 ~ p11
    ),
    y = rbinom(N, 1, p_y_given_x_z)
  ) %>% 
  select(unit, x, y, z)

The empirical joint probability distribution is as follows.

p_x_y_z <- df %>% 
  group_by(x, y, z) %>% 
  count() %>% 
  ungroup() %>% 
  mutate(p = n / sum(n))
x y z n p
0 0 0 1068 0.1068
0 0 1 197 0.0197
0 1 0 5609 0.5609
0 1 1 224 0.0224
1 0 0 52 0.0052
1 0 1 1016 0.1016
1 1 0 400 0.0400
1 1 1 1434 0.1434

The population-level probability difference is given by:

diff_pop <- p_x_y_z %>% 
  group_by(x) %>% 
  summarise(p = sum(n * y) / sum(n)) %>% 
  spread(x, p) %>%
  mutate(diff = `1` - `0`)
0 1 diff
0.8217808 0.6319779 -0.1898028

which is close to the theoretical value.

Similarly, the sub-population differences are

diff_z <- p_x_y_z %>% 
  group_by(x, z) %>% 
  summarise(p = sum(n * y) / sum(n)) %>% 
  spread(x, p) %>% 
  mutate(diff = `1` - `0`)
z 0 1 diff
0 0.8400479 0.8849558 0.0449078
1 0.5320665 0.5853061 0.0532396

which are also close to the theoretical values we calculated. More importantly, they have a different sign to the population difference, confiming that we have case of Simpson’s reversal.

To leave a comment for the author, please follow the link and comment on their blog: Brian Callander.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)