Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continuing the previous post concerning linear regression analysis with non-informative priors in R, I will show how to derive numerical summaries for the regression parameters without Monte Carlo integration. The theoretical background for this post is contained in Chapter 14 of Bayesian Data Analysis which should be consulted for more information.
The Residual Standard Deviation
The residual standard deviation
where
No closed form appears to exist for the quantiles (median, lower and upper limits of a symmetric credible interval, CrI) so that the equation above needs to be solved numerically. Since the marginal
The mode can be obtained by direct differentiation of the PDF and is given by the closed form expression:
The Regression Coefficients
The marginal distribution of the entire coefficient vector
Since this is a symmetric unimodal distribution, the mean, mode and median coincide these are all equal to the maximum likelihood estimates, while its standard deviation is available in closed form:
The Gory Details
Two R functions are used to carry out the aforementioned tasks, i.e. one function that provides the integrand for the mean of the scaled inverse chi square distribution:
## integrand for the mean of the scaled inverse ##chisq2 with df=n, scale=t2 mn.scIX2.sqrt<-function(x,n,t2) { s2<-x^2 n.2<-n/2.0 lx<-log(x) 2.0*x*x*exp(-n*t2/(2.0*s2)-lgamma(n.2)+ n.2*(log(t2)+log(n.2))-(n+2)*lx) }
and a second (much longer!) function that receives the fitted lm object and calculates the numerical summaries for the regression coefficients and the residual standard deviation:
bayesfitAnal<-function(lmfit){ ## extract coefficients, dfs and variance ## matrix from lm QR<-lmfit$qr df<-lmfit$df R<-qr.R(QR) ## R component coef<-lmfit$coef Vb<-chol2inv(R) ## variance(unscaled) s2<-(t(lmfit$residuals)%*%lmfit$residuals) s2<-s2[1,1]/df scale<-sqrt(s2*diag(Vb)) ## standard errors of the univariate t se=scale*ifelse(df>2, sqrt(df/(df-2)), ifelse(df<1,NA,Inf)) ## dataframe for returned values ret<-data.frame(coef=coef,se=se,t=coef/se, mode=coef,median=coef, "CrI.2.5%"=qt(0.025,df=df), "CrI.97.5%"=qt(0.975,df=df)) ## CrI limits for t-distributed quantities ret[,6:7]<-ret[,6:7]*se+coef ## make extra space for sigma ret<-rbind(ret,rep(0,7)) rownames(ret)[3]<-"sigma" ## mean of scale M1<-integrate(mn.scIX2.sqrt,n=df, t2=s2,lower=0,upper=Inf)$val ## expectation of the square of the scale ## (which is equal to the expectation of the ## initial inverse chi square distribution S1<-ifelse(df<=2,NA, df*s2/(df-2)) ret[3,1]<-M1 ## mean ret[3,2]<-sqrt(S1-M1^2) ## sd ret[3,3]<-ret[3,1]/ret[3,2] ## t ret[3,4]<-sqrt(s2*df/(df+1)) ## mode ## calculate quantiles - these take ## place in the scale of the precision ## median ret[3,5]<-uniroot(function(x) pgamma(x, shape=df/2,scale=2/(s2*df), lower.tail=FALSE)-0.5, lower=0,upper=1/s2)$root ## lower limit of 95% CrI ret[3,6]<-uniroot(function(x) pgamma(x, shape=df/2,scale=2/(s2*df), lower.tail=FALSE)-0.025,lower=0, upper=1/(M1-3*ret[3,2])^2)$root ## upper limit of 95% CrI ret[3,7]<-uniroot(function(x) pgamma(x, shape=df/2,scale=2/(s2*df), lower.tail=FALSE)-0.975,lower=0, upper=1/s2)$root ## raise to -1/2 to change from ## precision to standard deviations ret[3,5:7]<-sqrt(1/ret[3,5:7]) ret }
To see some action I will revisit the linear regression example introduced in the first post and compare frequentist (lm) and Bayesian estimates obtained with Monte Carlo and non-Monte Carlo approaches:
summary(lmfit) ## Frequentist Call: lm(formula = weight ~ group) Residuals: Min 1Q Median 3Q Max -1.0710 -0.4938 0.0685 0.2462 1.3690 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.0320 0.2202 22.850 9.55e-15 *** groupTrt -0.3710 0.3114 -1.191 0.249 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6964 on 18 degrees of freedom Multiple R-squared: 0.07308, Adjusted R-squared: 0.02158 F-statistic: 1.419 on 1 and 18 DF, p-value: 0.249 > t(apply(bf,2,Bayes.sum)) ## Bayesian (MC) mean se t median CrI.2.5% CrI.97.5% (Intercept) 5.0321144 0.2338693 21.516785 5.0326123 4.5709622 5.4947022 groupTrt -0.3687563 0.3305785 -1.115488 -0.3706862 -1.0210569 0.2862806 sigma 0.7270983 0.1290384 5.634745 0.7095935 0.5270569 1.0299920 > bayesfitAnal(lmfit) ## Bayesian (non-MC) coef se t mode median CrI.2.5. CrI.97.5. (Intercept) 5.0320000 0.2335761 21.543296 5.0320000 5.0320000 4.5412747 5.5227253 groupTrt -0.3710000 0.3303265 -1.123131 -0.3710000 -0.3710000 -1.0649903 0.3229903 sigma 0.7271885 0.1295182 5.614566 0.6778158 0.7095623 0.5262021 1.0298379
The conservative, skeptical nature of the Bayesian inference (wider confidence intervals, due to proper acknowledgement of uncertainty) is evident no matter the numerical approach (Monte Carlo v.s. numerical integration) one uses for the inference. Although the numerical estimates of the regression coefficients agree (up to three decimal points) among the three numerical approaches, residual variance estimates don’t for this relatively small data set.
Monte Carlo estimates are almost as precise as the numerical integration for things like the mean and standard error of the estimated parameters, yet they lose precision for extreme quantilies. Monte Carlo is also slower, which may become an issue when fitting larger models.
That was all folks! In subsequent posts Iwill explore features of the Bayesian solution that will (hopefully) convince you that it is a much better alternative (especially when it doesn’t bring tears to your eyes!) relative to the frequentist one. If you use any of the code in these posts, I’d appreciate if you can drop a line 🙂
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.