debunking a (minor and personal) myth
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
For quite a while, I entertained the idea that Beta and Dirichlet proposals were more adequate than (log-)normal random walks proposals for parameters on (0,1) and simplicia (simplices, simplexes), respectively, when running an MCMC. For instance, for p in (0,1) the value of the Markov chain at time t-1, the proposal at time t could be a Be(εp,ε{1-p}) generator, since its mean is equal to p and its variance is proportional to 1/(1+ε). (Although I cannot find track of this notion in my books.) The parameter ε can be calibrated towards a given acceptance rate, like the golden number 0.234 of Gelman, Gilks and Roberts (1996). However, when using this proposal on a mixture model, Kaniav Kamari and myself realised today that there is a catch, namely that pushing ε down to achieve an acceptance rate near 0.234 may end up in disaster, since the parameters of the Beta or of the Dirichlet may become lower than 1, which implies an infinite explosion on some boundaries of the parameter space. An explosion that gets more and more serious as ε decreases to zero. Hence is more and more likely to decrease the acceptance rate, thus to reduce ε, which in turns concentrates even more the support on the boundary and leads to a vicious circle and no convergence to the target acceptance rate…When I tried to implement this approach to illustrate my point
target=function(x,a=1,b=1){dbeta(x,a,b)} T=1e4 #MCMC runs bsch=50 #batsch size mkv=rep(runif(1),T) #MCMC chain eps=rep(1/runif(1,10,100),T) #adapted proposal scale ace=rep(0,T) #acceptance rate for (t in 2:T){ batsch=mkv[t-1] for (d in 1:bsch){ # Beta proposal prop=rbeta(1,eps[t-1]*batsch,eps[t-1]*(1-batsch)) mh=target(prop)*dbeta(batsch,eps[t-1]* prop,eps[t-1]*(1-prop))/ dbeta(prop,eps[t-1]*batsch,eps[t-1]*(1-batsch))/ target(batsch) #Metropolis-Hastings ratio if (runif(1)<mh){ace[t-1]=ace[t-1]+1;batsch=prop}} ace[t-1]=ace[t-1]/bsch mkv[t]=batsch #subsampled Markov chain # update of epsilon if (ace[t-1]<.456){eps[t]=eps[t-1]*1.01 }else{eps[t]=eps[t-1]/1.01} } #end of the MCMC loop
Indeed, when using a Be(a,b) target with either a<1 or b<1, the above MCMC algorithm will eventually but inevitably run into difficulties, because the Beta generator ends up producing either 0 or 1, thanks to rounding effects, hence an NA value for the Metropolis-Hastings ratio. For a Be(a,b) target with both coefficients above one, there was no convergence issue and no difficulty with the behaviour of ε. As shown by the picture above when the target is uniform. Or the one below when it is a Be(4,8).
Robin Ryder was sitting in front of me in our CREST office as I was experimenting with this and he suggested to use the current value of the Markov chain as the mode of the proposal instead of as the mean of the proposal. Which amounts to adding one (1) to both Beta coefficients. And which solves the problem altogether. Thanks, Robin!
Filed under: Books, Kids, R, Statistics, University life Tagged: adaptive MCMC methods, beta distribution, Dirichlet prior, Gaussian random walk, Markov chains
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.