How the odds ratio confounds: a brief study in a few colorful figures
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The odds ratio always confounds: while it may be constant across different groups or clusters, the risk ratios or risk differences across those groups may vary quite substantially. This makes it really hard to interpret an effect. And then there is inconsistency between marginal and conditional odds ratios, a topic I seem to be visiting frequently, most recently last month.
My aim here is to generate a few figures that might highlight some of these issues.
Assume that there is some exposure (indicated by the use of a 1 or 0 subscript) applied across a number of different groups or clusters of people (think different regions, hospitals, schools, etc.) – indicated by some number or letter i. Furthermore, assume that the total number exposed at each location is the same as the number unexposed: Ni0=Ni1=N=100.
The number of folks with exposure at a particular location i who have a poor outcome is ni1 and the number with a good outcome is N−ni1. Likewise, the corresponding measures for folks not exposed are ni0 and N−ni0. The probabilities of a poor outcome for exposed and non-exposed are ni1/N and ni0/N. The relative risk of a poor outcome for those exposed compared to those non exposed is
RRi=ni1/Nni0/N=ni1ni0,
RDi=ni1N−ni0N=ni1–ni0N,
ORi=[ni1/N]/[(N–ni1)/N][ni0/N]/[(N–ni0)/N]
The simple conditional logistic regression model that includes a group-level random effect bi assumes a constant odds ratio between exposed and unexposed individuals across the different clusters:
logit(Yij)=β0+β1Eij+bi,
The point of all of this is to illustrate that although the odds-ratio is the same across all groups/clusters (i.e., there is no i subscript in β1 and ORi=OR), the risk ratios and risk differences can vary greatly across groups, particularly if the b’s vary considerably.
Constant odds ratio, different risk ratios and differences
If the odds ratio is constant and we know ni1, we can perform a little algebraic maneuvering on the OR formula above to find ni0:
ni0=N×ni1OR×(N–ni1)+ni1
If we assume that the ni1’s can range from 2 to 98 (out of 100), we can see how the risk ratios and risk differences vary considerably even though we fix the odds ratio fixed at a value of 3 (don’t pay too close attention to the fact the n0 is not an integer – this is just an illustration that makes a few violations – if I had used N=1000, we could have called this rounding error):
N <- 100 trueOddsRatio <- 3 n1 <- seq(2:98) n0 <- (N * n1)/(trueOddsRatio * (N - n1) + n1) oddsRatio <- ((n1 / (N - n1) ) / (n0 / (N - n0) )) riskRatio <- n1 / n0 riskDiff <- (n1 - n0) / N dn <- data.table(n1 = as.double(n1), n0, oddsRatio, riskRatio, riskDiff = round(riskDiff,3)) dn[1:6] ## n1 n0 oddsRatio riskRatio riskDiff ## 1: 1 0.3355705 3 2.98 0.007 ## 2: 2 0.6756757 3 2.96 0.013 ## 3: 3 1.0204082 3 2.94 0.020 ## 4: 4 1.3698630 3 2.92 0.026 ## 5: 5 1.7241379 3 2.90 0.033 ## 6: 6 2.0833333 3 2.88 0.039
With a constant odds ratio of 3, the risk ratios range from 1 to 3, and the risk differences range from almost 0 to just below 0.3. The odds ratio is not exactly informative with respect to these other two measures. The plots - two takes on the same data - tell a better story:
Another look at contrasting marginal vs conditional odds ratios
Using this same simple framework, I thought I’d see if I can illustrate the relationship between marginal and conditional odds ratios.
In this case, we have two groups/clusters where the conditional odds ratios are equivalent, yet when we combine the groups into a single entity, the combined (marginal) odds ratio is less than each of the conditional odds ratios.
In this scenario each cluster has 100 people who are exposed and 100 who are not, as before. a1 and a0 represent the number of folks with a poor outcome for the exposed and unexposed in the first cluster, respectively; b1 and b0 represent the analogous quantities in the second cluster. As before a0 and b0 are derived as a function of a1 and b1, respectively, and the constant odds ratio.
constantOR <- function(n1, N, OR) { return(N*n1 / (OR*(N-n1) + n1)) } # Cluster A a1 <- 55 a0 <- constantOR(a1, N = 100, OR = 3) (a1*(100 - a0)) / (a0 * (100 - a1)) ## [1] 3 # Cluster B b1 <- 35 b0 <- constantOR(b1, N = 100, OR = 3) (b1*(100 - b0)) / (b0 * (100 - b1)) ## [1] 3 # Marginal OR tot0 <- a0 + b0 tot1 <- a1 + b1 (tot1*(200 - tot0)) / (tot0 * (200 - tot1)) ## [1] 2.886952
For this example, the marginal odds ratio is less than the conditional odds ratio. How does this contrast between the marginal and conditional odds ratio play out with a range of possible outcomes - all meeting the requirement of a constant conditional odds ratio? (Note we are talking about odds ratio larger than 1; everything is flipped if the OR is < 1.) The plot below shows possible combinations of sums a1+b1 and a0+b0, where the constant conditional odds ratio condition holds within each group. The red line shows all points where the marginal odds ratio equals the conditional odds ratio (which happens to be 3 in this case):
Here is the same plot, but a yellow line is drawn in all cases where a1=b1 (hence a0=b0). This line is the directly over the earlier line where the marginal odds ratios equal 3. So, sort of proof by plotting. The marginal odds ratio appears to equal the conditional odds ratio when the proportions of each group are equal.
But are the marginal odds ratios not on the colored lines higher or lower than 3? To check this, look at the next figure. In this plot, the odds ratio is plotted as a function of a1+b1, which represents the total number of poor outcomes in the combined exposed groups. Each line represents the marginal odds ratio for a specific value of a1.
If you notice, the odds ratio reaches the constant conditional odds ratio (which is 3) only when a1+b1=2a1, or when a1=b1. It appears then, when a1≠b1, the marginal odds ratio lies below the conditional odds ratio. Another “proof” by figure. OK, not a proof, but colorful nonetheless.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.