Considering sensitivity to unmeasured confounding: part 2
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In part 1 of this 2-part series, I introduced the notion of sensitivity to unmeasured confounding in the context of an observational data analysis. I argued that an estimate of an association between an observed exposure D and outcome Y is sensitive to unmeasured confounding if we can conceive of a reasonable alternative data generating process (DGP) that includes some unmeasured confounder that will generate the same observed distribution the observed data. I further argued that reasonableness can be quantified or parameterized by the two correlation coefficients ρUD and ρUY, which measure the strength of the relationship of the unmeasured confounder U with each of the observed measures. Alternative DGPs that are characterized by high correlation coefficients can be viewed as less realistic, and the observed data could be considered less sensitive to unmeasured confounding. On the other hand, DGPs characterized by lower correlation coefficients would be considered more sensitive.
I need to pause here for a moment to point out that something similar has been described much more thoroughly by a group at NYU’s PRIISM (see Carnegie, Harada & Hill and Dorie et al). In fact, this group of researchers has actually created an R
package called treatSens to facilitate sensitivity analysis. I believe the discussion in these posts here is consistent with the PRIISM methodology, except treatSens
is far more flexible (e.g. it can handle binary exposures) and provides more informative output than what I am describing. I am hoping that the examples and derivation of an equivalent DGP that I show here provide some additional insight into what sensitivity means.
I’ve been wrestling with these issues for a while, but the ideas for the derivation of an alternative DGP were actually motivated by this recent note by Fei Wan on an unrelated topic. (Wan shows how a valid instrumental variable may appear to violate a key assumption even though it does not.) The key element of Wan’s argument for my purposes is how the coefficient estimates of an observed model relate to the coefficients of an alternative (possibly true) data generation process/model.
OK – now we are ready to walk through the derivation of alternative DGPs for an observed data set.
Two DGPs, same data
Recall from Part 1 that we have an observed data model
Y=k0+k1D+ϵY
D=α0+α1U+ϵDY=β0+β1D+β2U+ϵY∗,
where U is some unmeasured confounder, and ϵD∼N(0,σ2D) and ϵY∗∼N(0,σ2Y∗). Can we go even further and find an alternative DGP where D has no direct effect on Y at all?
D=α0+α1U+ϵDY=β0+β2U+ϵY∗,
α1 (and σ2ϵD) derived from ρUD
In a simple linear regression model with a single predictor, the coefficient α1 can be specified directly in terms ρUD, the correlation between U and D:
α1=ρUDσDσU
Furthermore, we can derive an estimate of the variance for ϵD ( σ2ϵD) at each level of ρUD:
Var(D)=Var(α0+α1U+ϵD)σ2D=α21σ2U+σ2ϵDσ2ϵD=σ2D–α21(since σ2U=1)
So, for each value of ρUD that we generated, there is a corresponding pair (α1,σ2ϵD).
Determine β2
In the addendum I go through a bit of an elaborate derivation of β2, the coefficient of U in the alternative outcome model. Here is the bottom line:
β2=α11−σ2ϵDσ2D(k1–β1)
In the equation, we have σ2D and k1, which are both estimated from the observed data and the pair of derived parameters α1 and σ2ϵD based on ρUD. β1, the coefficient of D in the outcome model is a free parameter, set externally. That is, we can choose to evaluate all β2’s the are generated when β1=0. More generally, we can set β1=pk1, where 0≤p≤1. (We could go negative if we want, though I won’t do that here.) If p=1 , β1=k1 and β2=0; we end up with the original model with no confounding.
So, once we specify ρUD and p, we get the corresponding triplet (α1,σ2ϵD,β2).
Determine ρUY
In this last step, we can identify the correlation of U and Y, ρUY, that is associated with all the observed, specified, and derived parameters up until this point. We start by writing the alternative outcome model, and then replace D with the alternative exposure model, and do some algebraic manipulation to end up with a re-parameterized alternative outcome model that has a single predictor:
Y=β0+β1D+β2U+ϵ∗Y=β0+β1(α0+α1U+ϵD)+β2U+ϵ∗Y=β0+β1α0+β1α1U+β1ϵD+β2U+ϵ∗Y=β∗0+(β1α1+β2)U+ϵ+Y=β∗0+β∗1U+ϵ+Y,
where β∗0=β0+β1α0, β∗1=(β1α1+β2), and ϵ+Y=β1ϵD+ϵY∗.
As before, the coefficient in a simple linear regression model with a single predictor is related to the correlation of the two variables as follows:
β∗1=ρUYσYσU
Since β∗1=(β1α1+β2),
β1α1+β2=ρUYσYσUρUY=σUσY(β1α1+β2)=(β1α1+β2)σY
Determine σ2Y∗
In order to simulate data from the alternative DGPs, we need to derive the variation for the noise of the alternative model. That is, we need an estimate of σ2Y∗.
Var(Y)=Var(β0+β1D+β2U+ϵY∗)=β21Var(D)+β22Var(U)+2β1β2Cov(D,U)+Var(ϵy∗)=β21σ2D+β22+2β1β2ρUDσD+σ2Y∗
So,
σ2Y∗=Var(Y)–(β21σ2D+β22+2β1β2ρUDσD),
where Var(Y) is the variation of Y from the observed data. Now we are ready to implement this in R.
Implementing in R
If we have an observed data set with observed D and Y, and some target β1 determined by p, we can calculate/generate all the quantities that we just derived.
Before getting to the function, I want to make a brief point about what we do if we have other measured confounders. We can essentially eliminate measured confounders by regressing the exposure D on the confounders and conducting the entire sensitivity analysis with the residual exposure measurements derived from this initial regression model. I won’t be doing this here, but if anyone wants to see an example of this, let me know, and I can do a short post.
OK – here is the function, which essentially follows the path of the derivation above:
altDGP <- function(dd, p) { # Create values of rhoUD dp <- data.table(p = p, rhoUD = seq(0.0, 0.9, length = 1000)) # Parameters estimated from data dp[, `:=`(sdD = sd(dd$D), s2D = var(dd$D), sdY = sd(dd$Y))] dp[, k1:= coef(lm(Y ~ D, data = dd))[2]] # Generate b1 based on p dp[, b1 := p * k1] # Determine a1 dp[, a1 := rhoUD * sdD ] # Determine s2ed dp[, s2ed := s2D - (a1^2)] # Determine b2 dp[, g:= s2ed/s2D] dp <- dp[g != 1] dp[, b2 := (a1 / (1 - g) ) * ( k1 - b1 )] # Determine rhoUY dp[, rhoUY := ( (b1 * a1) + b2 ) / sdY ] # Eliminate impossible correlations dp <- dp[rhoUY > 0 & rhoUY <= .9] # Determine s2eyx dp[, s2eyx := sdY^2 - (b1^2 * s2D + b2^2 + 2 * b1 * b2 * rhoUD * sdD)] dp <- dp[s2eyx > 0] # Determine standard deviations dp[, sdeyx := sqrt(s2eyx)] dp[, sdedx := sqrt(s2ed)] # Finished dp[] }
Assessing sensitivity
If we generate the same data set we started out with last post, we can use the function to assess the sensitivity of this association.
defO <- defData(varname = "D", formula = 0, variance = 1) defO <- defData(defO, varname = "Y", formula = "1.5 * D", variance = 25) set.seed(20181201) dtO <- genData(1200, defO)
In this first example, I am looking for the DGP with β1=0, which is implemented as p=0 in the call to function altDGP
. Each row of output represents an alternative set of parameters that will result in a DGP with β1=0.
dp <- altDGP(dtO, p = 0) dp[, .(rhoUD, rhoUY, k1, b1, a1, s2ed, b2, s2eyx)] ## rhoUD rhoUY k1 b1 a1 s2ed b2 s2eyx ## 1: 0.295 0.898 1.41 0 0.294 0.904 4.74 5.36 ## 2: 0.296 0.896 1.41 0 0.295 0.903 4.72 5.50 ## 3: 0.297 0.893 1.41 0 0.296 0.903 4.71 5.63 ## 4: 0.298 0.890 1.41 0 0.297 0.902 4.69 5.76 ## 5: 0.299 0.888 1.41 0 0.298 0.902 4.68 5.90 ## --- ## 668: 0.896 0.296 1.41 0 0.892 0.195 1.56 25.35 ## 669: 0.897 0.296 1.41 0 0.893 0.193 1.56 25.35 ## 670: 0.898 0.296 1.41 0 0.894 0.191 1.56 25.36 ## 671: 0.899 0.295 1.41 0 0.895 0.190 1.56 25.36 ## 672: 0.900 0.295 1.41 0 0.896 0.188 1.55 25.37
Now, I am creating a data set that will be based on four levels of β1. I do this by creating a vector p=<0.0,0.2,0.5,0.8>. The idea is to create a plot that shows the curve for each value of p. The most extreme curve (in this case, the curve all the way to the right, since we are dealing with positive associations only) represents the scenario where p=0 (i.e. β1=0). The curves moving to the left reflect increasing sensitivity as p increases.
dsenO <- rbindlist(lapply(c(0.0, 0.2, 0.5, 0.8), function(x) altDGP(dtO, x)))
I would say that in this first case the observed association is moderately sensitive to unmeasured confounding, as correlations as low as 0.5 would enough to erase the association.
In the next case, if the association remains unchanged but the variation of Y is considerably reduced, the observed association is much less sensitive. However, it is still quite possible that the observed overestimation is at least partially overstated, as relatively low levels of correlation could reduce the estimated association.
defA1 <- updateDef(defO, changevar = "Y", newvariance = 4)
In this last scenario, variance is the same as the initial scenario, but the association is considerably weaker. Here, we see that the estimate of the association is extremely sensitive to unmeasured confounding, as low levels of correlation are required to entirely erase the association.
defA2 <- updateDef(defO, changevar = "Y", newformula = "0.25 * D")
treatSens
package
I want to show output generated by the treatSens
package I referenced earlier. treatSens
requires a formula that includes an outcome vector Y, an exposure vector Z, and at least one vector of measured of confounders X. In my examples, I have included no measured confounders, so I generate a vector of independent noise that is not related to the outcome.
library(treatSens) X <- rnorm(1200) Y <- dtO$Y Z <- dtO$D testsens <- treatSens(Y ~ Z + X, nsim = 5) sensPlot(testsens)
Once treatSens
has been executed, it is possible to generate a sensitivity plot, which looks substantively similar to the ones I have created. The package uses sensitivity parameters ζZ and ζY, which represent the coefficients of U, the unmeasured confounder. Since treatSens
normalizes the data (in the default setting), these coefficients are actually equivalent to the correlations ρUD and ρUY that are the basis of my sensitivity analysis. A important difference in the output is that treatSens
provides uncertainty bands, and extends into regions of negative correlation. (And of course, a more significant difference is that treatSens
is flexible enough to handle binary exposures, whereas I have not yet extended my analytic approach in that direction, and I suspect it is no possible for me to do so due to non-collapsibility of logistic regression estimands - I hope to revisit this in the future.)
Observed data scenario 1: Y∼N(1.50Z,25)
Observed data scenario 2: Y∼N(1.50Z,4)
Addendum: Derivation of β2
In case you want more detail on how we derive β2 from the observed data model and assumed correlation parameters, here it is. We start by specifying the simple observed outcome model:
Y=k0+k1D+ϵY
We can estimate the parameters k0 and k1 using this standard matrix solution:
\[
where W is the n×2 design matrix:
W=[1,D]n×2.
We can replace Y with the alternative outcome model:
\[
\begin{aligned}
Note that
(WTW)−1WT(β0)=<β0,0>and(WTW)−1WT(β1D)=<0,β1>.
Now, we need to figure out what (WTW)−1WT(β2U) is. First, we rearrange the alternate exposure model:
D=α0+α1U+ϵDα1U=D−α0−ϵDU=1α1(D−α0−ϵD)β2U=β2α1(D−α0−ϵD)
We can replace β2U:
(WTW)−1WT(β2U)=(WTW)−1WT[β2α1(D−α0−ϵD)]=<−β2α1α0,0>+<0,β2α1>−β2α1(WTW)−1WTϵD=<−β2α1α0,β2α1>−β2α1(WTW)−1WTϵD
And now we get back to \(
\[
\begin{aligned}
where γ0 and γ1 come from regressing ϵD on D:
ϵD=γ0+γ1D
\[
\begin{aligned}
Since we can center all the observed data, we can easily assume that k0=0. All we need to worry about is k1:
k1=β1+β2α1(1−γ1)β2α1(1−γ1)=k1−β1β2=α11−γ1(k1−β1)
We have generated α1 based on ρUD, k1 is a estimated from the data, and β1 is fixed based on some p,0≤p≤1 such that β1=pk1. All that remains is γ1:
γ1=ρϵDDσϵDσD
Since D=α0+α1U+ϵD (and ϵD⊥⊥U)
ρϵDD=Cov(ϵD,D)σϵDσD=Cov(ϵD,α0+α1U+ϵD)σϵDσD=σ2ϵDσϵDσD=σϵDσD
It follows that
γ1=ρϵDDσϵDσD=σϵDσD×σϵDσD=σ2ϵDσ2D
So, now, we have all the elements to generate β2 for a range of α1’s and σ2ϵD’s:
β2=α11−σ2ϵDσ2D(k1−β1)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.