V is for (Meta-Analysis) Variance
[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Today, I’ll talk about the concept of variance in meta-analysis and discuss how metafor gives you that information. But first, we should set up our data frames with study information and data. I’ll be reusing the two data frames I created for the E post – one to compute standardized mean difference (Cohen’s d) and one to compute log odds ratio.
smd_meta<-data.frame( id = c("005","005","029","031","038","041","041","058","058","067","067"), study = c(1,2,3,1,1,1,2,1,2,1,2), author_year = c("Ruva 2007","Ruva 2007","Chrzanowski 2006","Studebaker 2000", "Ruva 2008","Bradshaw 2007","Bradshaw 2007","Wilson 1998", "Wilson 1998","Locatelli 2011","Locatelli 2011"), n1 = c(138,140,144,21,54,78,92,31,29,90,181), n2 = c(138,142,234,21,52,20,18,15,13,29,53), m1 = c(5.29,5.05,1.97,5.95,5.07,6.22,5.47,6.13,5.69,4.81,4.83), m2 = c(4.08,3.89,2.45,3.67,3.96,5.75,4.89,3.80,3.61,4.61,4.51), sd1 = c(1.65,1.50,1.08,1.02,1.65,2.53,2.31,2.51,2.51,1.20,1.19), sd2 = c(1.67,1.61,1.22,1.20,1.76,2.17,2.59,2.68,2.78,1.39,1.34) ) or_meta<-data.frame( id = c("001","003","005","005","011","016","025","025","035","039","045","064","064"), study = c(1,5,1,2,1,1,1,2,1,1,1,1,2), author_year = c("Bruschke 1999","Finkelstein 1995","Ruva 2007","Ruva 2007", "Freedman 1996","Keelen 1979","Davis 1986","Davis 1986", "Padawer-Singer 1974","Eimermann 1971","Jacquin 2001", "Ruva 2006","Ruva 2006"), tg = c(58,26,67,90,36,37,17,17,47,15,133,68,53), cg = c(49,39,22,50,12,33,19,17,33,11,207,29,44), tn = c(72,60,138,140,99,120,60,55,60,40,136,87,74), cn = c(62,90,138,142,54,120,52,57,60,44,228,83,73) )
Now we'll rerun the code from that previous post that we used to generate effect sizes. We reference the original data frame when we run this code if we want the computed variables to be appended to the existing data frame.
library(metafor) ## Loading required package: Matrix ## Loading 'metafor' package (version 2.0-0). For an overview ## and introduction to the package please type: help(metafor). smd_meta <- escalc(measure="SMD", m1i=m1, m2i=m2, sd1i=sd1, sd2i=sd2, n1i=n1, n2i=n2,data=smd_meta) or_meta <- escalc(measure="OR", ai=tg, bi=(tn-tg), ci=cg, di=(cn-cg), data=or_meta)
Let's take a quick look at the updated data frame, starting with smd_meta.
smd_meta ## id study author_year n1 n2 m1 m2 sd1 sd2 yi vi ## 1 005 1 Ruva 2007 138 138 5.29 4.08 1.65 1.67 0.7269 0.0154 ## 2 005 2 Ruva 2007 140 142 5.05 3.89 1.50 1.61 0.7433 0.0152 ## 3 029 3 Chrzanowski 2006 144 234 1.97 2.45 1.08 1.22 -0.4099 0.0114 ## 4 031 1 Studebaker 2000 21 21 5.95 3.67 1.02 1.20 2.0087 0.1433 ## 5 038 1 Ruva 2008 54 52 5.07 3.96 1.65 1.76 0.6464 0.0397 ## 6 041 1 Bradshaw 2007 78 20 6.22 5.75 2.53 2.17 0.1893 0.0630 ## 7 041 2 Bradshaw 2007 92 18 5.47 4.89 2.31 2.59 0.2444 0.0667 ## 8 058 1 Wilson 1998 31 15 6.13 3.80 2.51 2.68 0.8927 0.1076 ## 9 058 2 Wilson 1998 29 13 5.69 3.61 2.51 2.78 0.7867 0.1188 ## 10 067 1 Locatelli 2011 90 29 4.81 4.61 1.20 1.39 0.1592 0.0457 ## 11 067 2 Locatelli 2011 181 53 4.83 4.51 1.19 1.34 0.2603 0.0245
You'll notice that, in addition to the information from the data frame creation code above, I have two new variables: yi and vi. yi refers to the effect size for that study. vi is the variance associated with the effect size for that study. This information is important for computing the study weight, which I'll talk about tomorrow, as well as generating confidence intervals around the overall effect size, which I'll talk about Sunday. But you might be wondering where this variance comes from. For that, stand back - I'm about to use math.
Meta-analysis uses weights based on sample size. The reason for that is because meta-analysis is attempting to take information from multiple studies on a topic and estimate the underlying effect size that these studies are trying to estimate. We weight on sample size because studies done on more people are expected to be better able to estimate the true population value, so larger studies get more weight. On the flip side, larger studies are expected to have more precision in their estimate, so larger studies have smaller variance. In meta-analysis, the variance is estimated based on sample size. For standardized mean difference, the variance is computed as:
vi = ((n1+n2)/(n1*n2)) + (yi2/(2*(n1+n2)))
So for the first effect size, which is 0.7269, we can recreate the computed variance (within some rounding error):
((138+138)/(138*138)) + (0.7269/(2*(138+138))) ## [1] 0.0158096
That calculation is done for each effect size, and fortunately, metafor will do it for you. And since sample size is part of the calculation, you'll note that variances are larger for smaller studies and smaller for larger studies.
library(ggplot2) ggplot(smd_meta, aes(x=vi, y=(n1+n2))) + geom_point() + geom_text(aes(label=id),hjust=0, vjust=0) + scale_x_continuous(breaks=seq(0,0.15,0.01)) + scale_y_continuous(breaks=seq(0,400,25)) + labs(x = "Variance", y = "Total Sample Size") + theme_bw()
I did the same calculations for the or_meta data frame. Variance for odds ratio is calculated based on the sample size in each of the four cells (treatment-outcome1 (a), treatment-outcome2 (b), control-outcome1 (c), and control-outcome2 (d)):
vi = (1/na) + (1/nb) + (1/nc) + (1/nd)
So again, larger studies will have smaller variances, though imbalance among those four cells (for instance, having a lot more people in one group than another) will inflate variance somewhat. We can see that for study 045, which has a lot more people in the control group than treatment group.
or_meta ## id study author_year tg cg tn cn yi vi ## 1 001 1 Bruschke 1999 58 49 72 62 0.0945 0.1860 ## 2 003 5 Finkelstein 1995 26 39 60 90 0.0000 0.1131 ## 3 005 1 Ruva 2007 67 22 138 138 1.6046 0.0831 ## 4 005 2 Ruva 2007 90 50 140 142 1.1976 0.0620 ## 5 011 1 Freedman 1996 36 12 99 54 0.6931 0.1508 ## 6 016 1 Keelen 1979 37 33 120 120 0.1615 0.0809 ## 7 025 1 Davis 1986 17 19 60 52 -0.3759 0.1650 ## 8 025 2 Davis 1986 17 17 55 57 0.0513 0.1690 ## 9 035 1 Padawer-Singer 1974 47 33 60 60 1.0845 0.1655 ## 10 039 1 Eimermann 1971 15 11 40 44 0.5878 0.2279 ## 11 045 1 Jacquin 2001 133 207 136 228 1.5035 0.3933 ## 12 064 1 Ruva 2006 68 29 87 83 1.8968 0.1203 ## 13 064 2 Ruva 2006 53 44 74 73 0.5089 0.1237 ggplot(or_meta, aes(x=vi, y=(tn+cn))) + geom_point() + geom_text(aes(label=id),hjust=0, vjust=0) + scale_x_continuous(breaks=seq(0,0.4,0.05)) + scale_y_continuous(breaks=seq(0,375,25)) + labs(x = "Variance", y = "Total Sample Size") + theme_bw()