[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Today, I’ll talk about the concept of variance in meta-analysis and discuss how metafor gives you that information. But first, we should set up our data frames with study information and data. I’ll be reusing the two data frames I created for the E post – one to compute standardized mean difference (Cohen’s d) and one to compute log odds ratio.
smd_meta<-data.frame( id = c("005","005","029","031","038","041","041","058","058","067","067"), study = c(1,2,3,1,1,1,2,1,2,1,2), author_year = c("Ruva 2007","Ruva 2007","Chrzanowski 2006","Studebaker 2000", "Ruva 2008","Bradshaw 2007","Bradshaw 2007","Wilson 1998", "Wilson 1998","Locatelli 2011","Locatelli 2011"), n1 = c(138,140,144,21,54,78,92,31,29,90,181), n2 = c(138,142,234,21,52,20,18,15,13,29,53), m1 = c(5.29,5.05,1.97,5.95,5.07,6.22,5.47,6.13,5.69,4.81,4.83), m2 = c(4.08,3.89,2.45,3.67,3.96,5.75,4.89,3.80,3.61,4.61,4.51), sd1 = c(1.65,1.50,1.08,1.02,1.65,2.53,2.31,2.51,2.51,1.20,1.19), sd2 = c(1.67,1.61,1.22,1.20,1.76,2.17,2.59,2.68,2.78,1.39,1.34) ) or_meta<-data.frame( id = c("001","003","005","005","011","016","025","025","035","039","045","064","064"), study = c(1,5,1,2,1,1,1,2,1,1,1,1,2), author_year = c("Bruschke 1999","Finkelstein 1995","Ruva 2007","Ruva 2007", "Freedman 1996","Keelen 1979","Davis 1986","Davis 1986", "Padawer-Singer 1974","Eimermann 1971","Jacquin 2001", "Ruva 2006","Ruva 2006"), tg = c(58,26,67,90,36,37,17,17,47,15,133,68,53), cg = c(49,39,22,50,12,33,19,17,33,11,207,29,44), tn = c(72,60,138,140,99,120,60,55,60,40,136,87,74), cn = c(62,90,138,142,54,120,52,57,60,44,228,83,73) )
Now we’ll rerun the code from that previous post that we used to generate effect sizes. We reference the original data frame when we run this code if we want the computed variables to be appended to the existing data frame.
library(metafor) ## Loading required package: Matrix ## Loading 'metafor' package (version 2.0-0). For an overview ## and introduction to the package please type: help(metafor). smd_<- escalc(measure="SMD", m1i=m1, m2i=m2, sd1i=sd1, sd2i=sd2, n1i=n1, n2i=n2,data=smd_meta) or_<- escalc(measure="OR", ai=tg, bi=(tn-tg), ci=cg, di=(cn-cg), data=or_meta)
Let’s take a quick look at the updated data frame, starting with smd_meta.
smd_meta ## id study author_year n1 n2 m1 m2 sd1 sd2 yi vi ## 1 005 1 Ruva 2007 138 138 5.29 4.08 1.65 1.67 0.7269 0.0154 ## 2 005 2 Ruva 2007 140 142 5.05 3.89 1.50 1.61 0.7433 0.0152 ## 3 029 3 Chrzanowski 2006 144 234 1.97 2.45 1.08 1.22 -0.4099 0.0114 ## 4 031 1 Studebaker 2000 21 21 5.95 3.67 1.02 1.20 2.0087 0.1433 ## 5 038 1 Ruva 2008 54 52 5.07 3.96 1.65 1.76 0.6464 0.0397 ## 6 041 1 Bradshaw 2007 78 20 6.22 5.75 2.53 2.17 0.1893 0.0630 ## 7 041 2 Bradshaw 2007 92 18 5.47 4.89 2.31 2.59 0.2444 0.0667 ## 8 058 1 Wilson 1998 31 15 6.13 3.80 2.51 2.68 0.8927 0.1076 ## 9 058 2 Wilson 1998 29 13 5.69 3.61 2.51 2.78 0.7867 0.1188 ## 10 067 1 Locatelli 2011 90 29 4.81 4.61 1.20 1.39 0.1592 0.0457 ## 11 067 2 Locatelli 2011 181 53 4.83 4.51 1.19 1.34 0.2603 0.0245
You’ll notice that, in addition to the information from the data frame creation code above, I have two new variables: yi and vi. yi refers to the effect size for that study. vi is the variance associated with the effect size for that study. This information is important for computing the study weight, which I’ll talk about tomorrow, as well as generating confidence intervals around the overall effect size, which I’ll talk about Sunday. But you might be wondering where this variance comes from. For that, stand back – I’m about to use math.
Meta-analysis uses weights based on sample size. The reason for that is because meta-analysis is attempting to take information from multiple studies on a topic and estimate the underlying effect size that these studies are trying to estimate. We weight on sample size because studies done on more people are expected to be better able to estimate the true population value, so larger studies get more weight. On the flip side, larger studies are expected to have more precision in their estimate, so larger studies have smaller variance. In meta-analysis, the variance is estimated based on sample size. For standardized mean difference, the variance is computed as:
vi = ((n1+n2)/(n1*n2)) + (yi2/(2*(n1+n2)))
So for the first effect size, which is 0.7269, we can recreate the computed variance (within some rounding error):
((138+138)/(138*138)) + (0.7269/(2*(138+138))) ## [1] 0.0158096
That calculation is done for each effect size, and fortunately, metafor will do it for you. And since sample size is part of the calculation, you’ll note that variances are larger for smaller studies and smaller for larger studies.
library(ggplot2) ggplot(smd_meta, aes(x=vi, y=(n1+n2))) + geom_point() + geom_text(aes(label=id),hjust=0, vjust=0) + scale_x_continuous(breaks=seq(0,0.15,0.01)) + scale_y_continuous(breaks=seq(0,400,25)) + labs(x = "Variance", y = "Total Sample Size") + theme_bw()