Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I decided to start blogging the R code used for some of my statistical posts, so I will start with the meta-analysis posts and move on to more difficult stuff.
As stated previously (here and here) the problem is to convert the reported relative risks(RR,
- First we obtain an estimate for
from the reported RR: - Secondly we transform the confidence interval to estimates for
: - Averaging over
yields - A second estimate for the
is obtained from the p-value using the quantile function of the normal distribution: - Averaging over
yields
The following R function can be used to carry out these transformations:
algSTE<-function(X,L,U,p) { v1<-c(log(X),log(sqrt(U*L))) alg.X<-mean(v1[is.finite(v1)]) v2<-c(log(sqrt(U/L))/qnorm(0.975),abs(alg.X/qnorm(p/2))) alg.Y<-mean(v2[is.finite(v2)]) return(list("TE"=alg.X,"se"=alg.Y)) }
A potentially more accurate approach is to acknowledge the finite precision inherent in the medical journal table for all quantities of interest (treatment effect, 95% confidence intervals and p-values). In a previous post I gave a Bayesian solution to this problem which obeys the
</pre> ## function that extracts treatment effect (usually in log scale) and ## standard error from treatment effect (usually exponentiated log-odds ratios ## or log-relative risks), associated pci% CI and p-value rounded to ## d (TE, 95% CI) and dp (p-value) significant digits. ## This is based on a statistical simulation algorithm that draws ## N samples from the posterior distribution of these parameters ## followed by a rejection step. Due to the latter, the final number ## of samples can be <<N which may influence the accuracy of the results ## The algorithm monitors the acceptance probability and increases the ## number of points samples (up to N*Nmax) to ensure that N points are ## actually returned for both SE and Treament Effect ## X: exponentiated treatment effect ## L,U: lower and upper limits of the pci (usually 95%) CI ## p: p-value ## d: precision of X,L,U ## dp: precision of p value ## N: number of Monte Carlo samples extractTSE<-function(X,L,U,p,d,dp,pci=0.95, N=1000000,Nmax=10) { ## Compute bounds of belief functions a<-5*(10^(-d-1)); ap<-5*(10^(-dp-1)); Xl<-X-5*(10^(-d-1)); Xu<-X+5*(10^(-d-1)); Ll<-L-5*(10^(-d-1)); ## cannot have negative ests for pos quant Lu<-L+5*(10^(-d-1)); Ul<-U-5*(10^(-d-1)); Uu<-U+5*(10^(-d-1)); pn<-(p-5*(10^(-dp-1)))/2 pp<-(p+5*(10^(-dp-1)))/2 pn<-max(0,pn) pp<-min(1,pp) pl<-abs(qnorm(pn)); pu<-abs(qnorm(pp)); pci<-abs(qnorm((1-pci)/2)); ret<-list("TE"=rep(0,N),"se"=rep(0,N),"prej"=0,"Nact"=0,"N"=0) Nest<-0 Nsample<-0 ## how many samples were they obtained? Nget<-N Nmax<-Nmax*N ## maximum number of estimates to obtain while(Nest<N & Nsample <= Nmax & Nget>0) { ret0<-simTSE(Nget,Xl,Xu,Ll,Ul,Lu,Uu,pl,pu,pci) ## take as much as needed to fill the N slots ntake<-min((N-Nest),ret0$Nact) ret$TE[(Nest+1):(Nest+ntake)]<-ret0$TE[1:ntake] ret$se[(Nest+1):(Nest+ntake)]<-ret0$se[1:ntake] Nest<-Nest+ntake Nsample<-Nsample+Nget Nget<-min(N,floor((N-Nest)/ret0$prej)) ## only obtain as many as anticipated } ret$N<-Nsample;ret$Nact<-Nest;ret$prej<-ret$Nact/ret$N ## now filter out the slots of N that were never sampled ret$TE<-head(ret$TE,ret$Nact) ret$se<-head(ret$se,ret$Nact) ret } simTSE<-function(N,Xl,Xu,Ll,Ul,Lu,Uu,pl,pu,pci) { l<-runif(N,Ll,Lu); ## belief about lower limit of CI u<-runif(N,Ul,Uu); ## belief about upper limit of CI x1<-sqrt(u*l); y1<-sqrt(u/l); x1.t<-log(x1) y1.t<-log(y1)/pci; z<-abs(x1.t/y1.t); ## select only points satisfying the point estimate constraint sel<-I(x1>=Xl)*I(x1<=Xu) if(!is.finite(pl) ) { zlsel<-rep(1,length(z)) } else { zlsel<-I(z<=pl)} if(!is.finite(pu) ) { zusel<-rep(1,length(z)) } else { zusel<-I(z>=pu)} psel<-(zusel*zlsel)==1 ## select points satisfying both criteria selAll<-(psel*sel)==1 Nact<-sum(selAll,na.rm=T) ## the reason for the next test is to remove impossible ## sampled values i.e. negative values for the sampled values ## from the joint distro of the CI or the RR itself selAll<-selAll & !is.na(selAll) return(list("TE"=x1.t[selAll],"se"=y1.t[selAll],"prej"=Nact/N,"Nact"=Nact,"N"=N)) } <pre>
The implementation is based around two functions with the user only needing to call the first one. Note that for some combinations of input values, a large number of random numbers (up to Nmax*N) have to be generated in order for the Monte Carlo algorithm to retain a certain level of precision (controlled by the number of samples N).
Let’s test these two approaches from within R now in a toy example:
> d<-2 ## precision TE, 95% CI (all exponentiated) to > dp<-3 ## precision of p-value > mn<-0.2209039 ## true treatment effect (log scale) > se<-0.2761299 ## true SE > > X<-round(exp(mn),d);L<-round(exp(mn-qnorm(0.975)*se),d);U<-round(exp(mn+qnorm(0.975)*se),d) > p<-round(2*(1-pnorm(abs(mn/se))),dp) > > X ## rounded/exponentiated treatment effect [1] 1.25 > L ## lower 95% CI (rounded/exponentiated) [1] 0.73 > U ## upper 95% CI (rounded/exponentiated) [1] 2.14 > p [1] 0.424 > > moko<-extractTSE(X,L,U,p,d,dp,N=10000) > mean(moko$TE) ## Bayesian TE [1] 0.2204252 > mean(moko$se) ## Bayesian SE [1] 0.2757051 > > moko2<-algSTE(X,L,U,p) > moko2$TE ## algebraic TE (log scale) [1] 0.2230955 > moko2$se ## algebraic se [1] 0.2767075
The performance of the two approaches appears similar in this particular case!
However as we show here there will be occasions which the simple(-minded) algebraic approach will fail and the (big) Bayesian guns will be required. Posting the source code will ensure that the Bayesian procedure becomes available to everyone who needs to carry out a meta-analysis!!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.