Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve been working through Gelman et al.’s otherwise excellent Bayesian Data Analysis
The troublemaker is the SAT problem in section 5.5. The authors give values for two variables
I puzzled over this for quite a while, thinking that maybe I’d missed a prior/posterior distinction somewhere and my estimates were supposed to be subtlely shifted. But no, when I went and checked the original data source, I found that the values were reported to 4 sig figs. Repeating the calculation with the new values gives the expected results. Grr…
Here’s the data and R-code if anyone’s interested.
### Load SAT data from the Gelman et al book and the original Rubin paper df <- data.frame(school=LETTERS[1:8], book.y=c(28,8,-3,7,-1,1,18,12), rubin.y=c(28.39,7.94,-2.75,6.82,-0.64,0.63,18.01,12.16), book.sigma=c(15,10,16,11,9,11,10,18), rubin.sigma=c(14.9,10.2,16.3,11,9.4,11.4,10.4,17.6)) ### Rearrange data into a handy form df <- melt(df,id="school") vals <- colsplit(df$variable,"\\.",c("source","metric")) df <- cbind(df,vals) df <- df[,-2] df <- cast(df,school+source ~ metric) ### Calculate the summary statistics ddply(df,.(source),summarize,y=sum(y/sigma^2)/sum(1/sigma^2),sig=1/sum(1/sigma^2))
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.