Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
While playing around with Bayesian methods for random effects models, it occured to me that inverse-Wishart priors can really bite you in the bum. Inverse Wishart-priors are popular priors over covariance functions. People like them priors because they are conjugate to a Gaussian likelihood, i.e, if you have data
so that the
The inverse-Wishart works like this: a Wishart distribution
the Wishart sample is
which is why for large
A matrix S has inverse Wishart distribution
Let’s say we want to do Bayesian inference for the correlation of two Gaussian variables. Essentially, that’s a special case of what we started out with:
What are generally reasonable assumptions about
A good default choice might then be to take
which is suitably spread over several orders of magnitude. The marginal distribution for the correlation coefficient is:
which is uniform – not what we wanted (we’d like very high absolute correlations to be less probable). That’s not the main problem with this prior, though. The big problem is this:
For each sample
What we see here is a strong dependence between variance and correlation: the prior says that high variance implies high correlation, and low variance implies low-to-moderate correlation. This is a disaster for inference, because it means that correlation will tend to be exagerated if variance is higher than expected, which is the opposite of the shrinkage behaviour we’d like to see.
There are better priors for covariance matrices out there, but sometimes you might be stuck with the Wishart for computational reasons (for example, it’s the only choice you have in INLA for random effects). An option is to estimate the variances first, then tweak the inverse-Wishart prior to have the right scale. Increasing the value of
Below, some R code to reproduce the plots:
</pre> require(plyr) require(ggplot2) require(mvtnorm) #Given a 2x2 covariance matrix, compute corr. coeff ccm <- function(M) { M[2,1]/prod(sqrt(diag(M))) } #Generate n samples from the prior rprior <- function(n=1,r=3,M=diag(rep(1,2))) { Minv <- solve(M) rlply(n,chol2inv(chol(rwish(r,Minv)))) } #Wishart samples rwish <- function(r,R) { X <- rmvnorm(r,sig=R) t(X)%*%X } plot.marginal.variance <- function(df) { p <- ggplot(df,aes(var1))+geom_histogram(aes(y=..density..))+scale_x_log() p+theme_bw()+labs(x="\nVariance of the first component",y="Density\n") } plot.marginal.correlation <- function(df) { p <- ggplot(df,aes(cor))+geom_histogram(aes(y=..density..)) p+theme_bw()+labs(x="\nCorrelation coefficient",y="Density\n")+scale_x_continuous(lim=c(-1,1)) } plot.corr.var <- function(df) { p <- ggplot(df,aes(var1,cor))+geom_point()+scale_x_log() p+theme_bw()+labs(x="\nVariance of the first component",y="Correlation coeffient\n") } plot.conditional.corr <- function(df) { df 1 & var2 > 1), labels=c("Low variance","High variance"))) p <- ggplot(df,aes(cor))+geom_histogram(aes(y=..density..))+facet_wrap(~ high.var) p+theme_bw()+labs(x="\nCorrelation coefficient",y="Density\n")+scale_x_continuous(lim=c(-1,1)) } df <- ldply(rprior(2000),function(M) data.frame(var1=M[1,1],var2=M[2,2],cor=ccm(M)))
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.