Considering sensitivity to unmeasured confounding: part 2

[This article was first published on ouR data generation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In part 1 of this 2-part series, I introduced the notion of sensitivity to unmeasured confounding in the context of an observational data analysis. I argued that an estimate of an association between an observed exposure D and outcome Y is sensitive to unmeasured confounding if we can conceive of a reasonable alternative data generating process (DGP) that includes some unmeasured confounder that will generate the same observed distribution the observed data. I further argued that reasonableness can be quantified or parameterized by the two correlation coefficients ρUD and ρUY, which measure the strength of the relationship of the unmeasured confounder U with each of the observed measures. Alternative DGPs that are characterized by high correlation coefficients can be viewed as less realistic, and the observed data could be considered less sensitive to unmeasured confounding. On the other hand, DGPs characterized by lower correlation coefficients would be considered more sensitive.

I need to pause here for a moment to point out that something similar has been described much more thoroughly by a group at NYU’s PRIISM (see Carnegie, Harada & Hill and Dorie et al). In fact, this group of researchers has actually created an R package called treatSens to facilitate sensitivity analysis. I believe the discussion in these posts here is consistent with the PRIISM methodology, except treatSens is far more flexible (e.g. it can handle binary exposures) and provides more informative output than what I am describing. I am hoping that the examples and derivation of an equivalent DGP that I show here provide some additional insight into what sensitivity means.

I’ve been wrestling with these issues for a while, but the ideas for the derivation of an alternative DGP were actually motivated by this recent note by Fei Wan on an unrelated topic. (Wan shows how a valid instrumental variable may appear to violate a key assumption even though it does not.) The key element of Wan’s argument for my purposes is how the coefficient estimates of an observed model relate to the coefficients of an alternative (possibly true) data generation process/model.

OK – now we are ready to walk through the derivation of alternative DGPs for an observed data set.

Two DGPs, same data

Recall from Part 1 that we have an observed data model

Y=k0+k1D+ϵY

where ϵYN(0,σ2Y). We are wondering if there is another DGP that could have generated the data that we have actually observed:

D=α0+α1U+ϵDY=β0+β1D+β2U+ϵY,

where U is some unmeasured confounder, and ϵDN(0,σ2D) and ϵYN(0,σ2Y). Can we go even further and find an alternative DGP where D has no direct effect on Y at all?

D=α0+α1U+ϵDY=β0+β2U+ϵY,

α1 (and σ2ϵD) derived from ρUD

In a simple linear regression model with a single predictor, the coefficient α1 can be specified directly in terms ρUD, the correlation between U and D:

α1=ρUDσDσU

We can estimate σD from the observed data set, and we can reasonably assume that σU=1 (since we could always normalize the original measurement of U). Finally, we can specify a range of ρUD (I am keeping everything positive here for simplicity), such that 0<ρUD<0.90 (where I assume a correlation of 0.90 is at or beyond the realm of reasonableness). By plugging these three parameters into the formula, we can generate a range of α1’s.

Furthermore, we can derive an estimate of the variance for ϵD ( σ2ϵD) at each level of ρUD:

Var(D)=Var(α0+α1U+ϵD)σ2D=α21σ2U+σ2ϵDσ2ϵD=σ2Dα21(since σ2U=1)

So, for each value of ρUD that we generated, there is a corresponding pair (α1,σ2ϵD).

Determine β2

In the addendum I go through a bit of an elaborate derivation of β2, the coefficient of U in the alternative outcome model. Here is the bottom line:

β2=α11σ2ϵDσ2D(k1β1)

In the equation, we have σ2D and k1, which are both estimated from the observed data and the pair of derived parameters α1 and σ2ϵD based on ρUD. β1, the coefficient of D in the outcome model is a free parameter, set externally. That is, we can choose to evaluate all β2’s the are generated when β1=0. More generally, we can set β1=pk1, where 0p1. (We could go negative if we want, though I won’t do that here.) If p=1 , β1=k1 and β2=0; we end up with the original model with no confounding.

So, once we specify ρUD and p, we get the corresponding triplet (α1,σ2ϵD,β2).

Determine ρUY

In this last step, we can identify the correlation of U and Y, ρUY, that is associated with all the observed, specified, and derived parameters up until this point. We start by writing the alternative outcome model, and then replace D with the alternative exposure model, and do some algebraic manipulation to end up with a re-parameterized alternative outcome model that has a single predictor:

Y=β0+β1D+β2U+ϵY=β0+β1(α0+α1U+ϵD)+β2U+ϵY=β0+β1α0+β1α1U+β1ϵD+β2U+ϵY=β0+(β1α1+β2)U+ϵ+Y=β0+β1U+ϵ+Y,

where β0=β0+β1α0, β1=(β1α1+β2), and ϵ+Y=β1ϵD+ϵY.

As before, the coefficient in a simple linear regression model with a single predictor is related to the correlation of the two variables as follows:

β1=ρUYσYσU

Since β1=(β1α1+β2),

β1α1+β2=ρUYσYσUρUY=σUσY(β1α1+β2)=(β1α1+β2)σY

Determine σ2Y

In order to simulate data from the alternative DGPs, we need to derive the variation for the noise of the alternative model. That is, we need an estimate of σ2Y.

Var(Y)=Var(β0+β1D+β2U+ϵY)=β21Var(D)+β22Var(U)+2β1β2Cov(D,U)+Var(ϵy)=β21σ2D+β22+2β1β2ρUDσD+σ2Y

So,

σ2Y=Var(Y)(β21σ2D+β22+2β1β2ρUDσD),

where Var(Y) is the variation of Y from the observed data. Now we are ready to implement this in R.

Implementing in R

If we have an observed data set with observed D and Y, and some target β1 determined by p, we can calculate/generate all the quantities that we just derived.

Before getting to the function, I want to make a brief point about what we do if we have other measured confounders. We can essentially eliminate measured confounders by regressing the exposure D on the confounders and conducting the entire sensitivity analysis with the residual exposure measurements derived from this initial regression model. I won’t be doing this here, but if anyone wants to see an example of this, let me know, and I can do a short post.

OK – here is the function, which essentially follows the path of the derivation above:

altDGP <- function(dd, p) {
  
  # Create values of rhoUD
  
  dp <- data.table(p = p, rhoUD = seq(0.0, 0.9, length = 1000))
  
  # Parameters estimated from data
  
  dp[, `:=`(sdD = sd(dd$D), s2D = var(dd$D), sdY = sd(dd$Y))] 
  dp[, k1:= coef(lm(Y ~ D, data = dd))[2]]
  
  # Generate b1 based on p
  
  dp[, b1 := p * k1]
  
  # Determine a1
  
  dp[, a1 := rhoUD * sdD ]
  
  # Determine s2ed
  
  dp[, s2ed := s2D - (a1^2)]
  
  # Determine b2
  
  dp[, g:= s2ed/s2D]
  dp <- dp[g != 1]
  dp[, b2 := (a1 / (1 - g) ) * ( k1 - b1 )]
  
  # Determine rhoUY
  
  dp[, rhoUY := ( (b1 * a1) + b2 ) / sdY ]
  
  # Eliminate impossible correlations
  
  dp <- dp[rhoUY > 0 & rhoUY <= .9]
  
  # Determine s2eyx
  
  dp[, s2eyx := sdY^2 - (b1^2 * s2D + b2^2 + 2 * b1 * b2 * rhoUD * sdD)]
  dp <- dp[s2eyx > 0]
  
  # Determine standard deviations
  
  dp[, sdeyx := sqrt(s2eyx)]
  dp[, sdedx := sqrt(s2ed)]
  
  # Finished
  
  dp[]
  
}

Assessing sensitivity

If we generate the same data set we started out with last post, we can use the function to assess the sensitivity of this association.

defO <- defData(varname = "D", formula = 0, variance = 1)
defO <- defData(defO, varname = "Y", formula = "1.5 * D", variance = 25)

set.seed(20181201)
dtO <- genData(1200, defO)

In this first example, I am looking for the DGP with β1=0, which is implemented as p=0 in the call to function altDGP. Each row of output represents an alternative set of parameters that will result in a DGP with β1=0.

dp <- altDGP(dtO, p = 0)
dp[, .(rhoUD, rhoUY, k1, b1, a1, s2ed, b2, s2eyx)]
##      rhoUD rhoUY   k1 b1    a1  s2ed   b2 s2eyx
##   1: 0.295 0.898 1.41  0 0.294 0.904 4.74  5.36
##   2: 0.296 0.896 1.41  0 0.295 0.903 4.72  5.50
##   3: 0.297 0.893 1.41  0 0.296 0.903 4.71  5.63
##   4: 0.298 0.890 1.41  0 0.297 0.902 4.69  5.76
##   5: 0.299 0.888 1.41  0 0.298 0.902 4.68  5.90
##  ---                                           
## 668: 0.896 0.296 1.41  0 0.892 0.195 1.56 25.35
## 669: 0.897 0.296 1.41  0 0.893 0.193 1.56 25.35
## 670: 0.898 0.296 1.41  0 0.894 0.191 1.56 25.36
## 671: 0.899 0.295 1.41  0 0.895 0.190 1.56 25.36
## 672: 0.900 0.295 1.41  0 0.896 0.188 1.55 25.37

Now, I am creating a data set that will be based on four levels of β1. I do this by creating a vector p=<0.0,0.2,0.5,0.8>. The idea is to create a plot that shows the curve for each value of p. The most extreme curve (in this case, the curve all the way to the right, since we are dealing with positive associations only) represents the scenario where p=0 (i.e. β1=0). The curves moving to the left reflect increasing sensitivity as p increases.

dsenO <- rbindlist(lapply(c(0.0, 0.2, 0.5, 0.8), 
                     function(x) altDGP(dtO, x)))

I would say that in this first case the observed association is moderately sensitive to unmeasured confounding, as correlations as low as 0.5 would enough to erase the association.

In the next case, if the association remains unchanged but the variation of Y is considerably reduced, the observed association is much less sensitive. However, it is still quite possible that the observed overestimation is at least partially overstated, as relatively low levels of correlation could reduce the estimated association.

defA1 <- updateDef(defO, changevar = "Y", newvariance = 4)

In this last scenario, variance is the same as the initial scenario, but the association is considerably weaker. Here, we see that the estimate of the association is extremely sensitive to unmeasured confounding, as low levels of correlation are required to entirely erase the association.

defA2 <- updateDef(defO, changevar = "Y", newformula = "0.25 * D")

treatSens package

I want to show output generated by the treatSens package I referenced earlier. treatSens requires a formula that includes an outcome vector Y, an exposure vector Z, and at least one vector of measured of confounders X. In my examples, I have included no measured confounders, so I generate a vector of independent noise that is not related to the outcome.

library(treatSens)

X <- rnorm(1200)
Y <- dtO$Y
Z <- dtO$D

testsens <- treatSens(Y ~ Z + X, nsim = 5)
sensPlot(testsens)

Once treatSens has been executed, it is possible to generate a sensitivity plot, which looks substantively similar to the ones I have created. The package uses sensitivity parameters ζZ and ζY, which represent the coefficients of U, the unmeasured confounder. Since treatSens normalizes the data (in the default setting), these coefficients are actually equivalent to the correlations ρUD and ρUY that are the basis of my sensitivity analysis. A important difference in the output is that treatSens provides uncertainty bands, and extends into regions of negative correlation. (And of course, a more significant difference is that treatSens is flexible enough to handle binary exposures, whereas I have not yet extended my analytic approach in that direction, and I suspect it is no possible for me to do so due to non-collapsibility of logistic regression estimands - I hope to revisit this in the future.)

Observed data scenario 1: YN(1.50Z,25)

Observed data scenario 2: YN(1.50Z,4)

Observed data scenario 3: YN(0.25Z,25)

Addendum: Derivation of β2

In case you want more detail on how we derive β2 from the observed data model and assumed correlation parameters, here it is. We start by specifying the simple observed outcome model:

Y=k0+k1D+ϵY

We can estimate the parameters k0 and k1 using this standard matrix solution:

\[ \; = (W^TW)^{-1}W^TY,\]

where W is the n×2 design matrix:

W=[1,D]n×2.

We can replace Y with the alternative outcome model:

\[ \begin{aligned} \; &= (W^TW)^{-1}W^T(\beta_0 + \beta_1 D + \beta_2 U + \epsilon_Y^*) \\ &= \;<\beta_0, 0> + <0, \beta_1> +\; (W^TW)^{-1}W^T(\beta_2U) + \mathbf{0} \\ &= \;<\beta_0, \beta_1> +\; (W^TW)^{-1}W^T(\beta_2U) \end{aligned} \]

Note that

(WTW)1WT(β0)=<β0,0>and(WTW)1WT(β1D)=<0,β1>.

Now, we need to figure out what (WTW)1WT(β2U) is. First, we rearrange the alternate exposure model: D=α0+α1U+ϵDα1U=Dα0ϵDU=1α1(Dα0ϵD)β2U=β2α1(Dα0ϵD)

We can replace β2U:

(WTW)1WT(β2U)=(WTW)1WT[β2α1(Dα0ϵD)]=<β2α1α0,0>+<0,β2α1>β2α1(WTW)1WTϵD=<β2α1α0,β2α1>β2α1(WTW)1WTϵD

And now we get back to \(\) :

\[ \begin{aligned} \; &= \;<\beta_0,\; \beta_1> +\; (W^TW)^{-1}W^T(\beta_2U) \\ &= \;<\beta_0-\frac{\beta_2}{\alpha_1}\alpha_0, \; \beta_1 + \frac{\beta_2}{\alpha_1}>-\;\frac{\beta_2}{\alpha_1}(W^TW)^{-1}W^T \epsilon_D \\ &= \;<\beta_0-\frac{\beta_2}{\alpha_1}\alpha_0, \; \beta_1 + \frac{\beta_2}{\alpha_1}>-\;\frac{\beta_2}{\alpha_1}<\gamma_0, \; \gamma_1> \end{aligned} \]

where γ0 and γ1 come from regressing ϵD on D:

ϵD=γ0+γ1D

so,

\[ \begin{aligned} \; &= \;<\beta_0-\frac{\beta_2}{\alpha_1}\alpha_0 - \frac{\beta_2}{\alpha_1}\gamma_0, \; \beta_1 + \frac{\beta_2}{\alpha_1} - \frac{\beta_2}{\alpha_1}\gamma_1 > \\ &= \;<\beta_0-\frac{\beta_2}{\alpha_1}\left(\alpha_0 + \gamma_0\right), \; \beta_1 + \frac{\beta_2}{\alpha_1}\left(1 - \gamma_1 \right) > \end{aligned} \]

Since we can center all the observed data, we can easily assume that k0=0. All we need to worry about is k1:

k1=β1+β2α1(1γ1)β2α1(1γ1)=k1β1β2=α11γ1(k1β1)

We have generated α1 based on ρUD, k1 is a estimated from the data, and β1 is fixed based on some p,0p1 such that β1=pk1. All that remains is γ1:

γ1=ρϵDDσϵDσD

Since D=α0+α1U+ϵD (and ϵDU)

ρϵDD=Cov(ϵD,D)σϵDσD=Cov(ϵD,α0+α1U+ϵD)σϵDσD=σ2ϵDσϵDσD=σϵDσD

It follows that

γ1=ρϵDDσϵDσD=σϵDσD×σϵDσD=σ2ϵDσ2D

So, now, we have all the elements to generate β2 for a range of α1’s and σ2ϵD’s:

β2=α11σ2ϵDσ2D(k1β1)

To leave a comment for the author, please follow the link and comment on their blog: ouR data generation.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)