Parallel JAGS RNGs
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
As a matter of convention, we usually run 3 or 4 chains in JAGS. By default, this gives rise to chains that draw samples from 3 or 4 distinct pseudorandom number generators. I didn’t go and check whether it does things 111,222,333 or 123,123,123, but in any event the “parallel chains” in JAGS are samples drawn from distinct RNGs computed on a single processor core.
But we all have multiple cores now, or we’re computing on a cluster or the cloud! So the behavior we’d like from rjags is to use the foreach package with each JAGS chain using a parallel-safe RNG. The default behavior with n.chain=1
will be that each parallel instance will use .RNG.name[1]
, the Wichmann-Hill RNG.
JAGS 2.2.0 includes a new lecuyer
module (along with the glm
module, which everyone should probably always use, and doesn’t have many undocumented tricks that I know of). But lecuyer is completely undocumented! I tried .RNG.name="lecuyer::Lecuyer"
, .RNG.name="lecuyer::lecuyer"
, and .RNG.name="lecuyer::LEcuyer"
all to no avail. It ought to be .RNG.name="lecuyer::Lecuyer"
to be consistent with the other .RNG.name values! I looked around in the source to find where it checks its name from the inits, to discover that in fact it is
.Rng.name="lecuyer::RngStream"
So here’s how I set up 4 chains now:
library(doMC); registerDoMC() library(rjags); load.module("glm"); load.module("lecuyer") library(random) jinits <- function() { ### all the other params ### .Rng.name="lecuyer::RngStream", .Rng.seed=randomNumbers(n = 1, min = 1, max = 1e+06,col=1) } jags.parsamples <- foreach(i=1:getDoParWorkers()) %dopar% { model.jags <- jags.model(model, forJAGS, inits=jinits, n.chain=1, n.adapt=1000) result <- coda.samples(model.jags,params,1000) return(result) }
I would just as soon initialize them to the same state and use sequential substreams, but I think there is no way to do this. Four long separately-seeded streams should be more than fine; a quick look suggests that if you did n.chain>1
(on each core) you’d get sequential substreams.
I should also probably write a better .combine
so that it’s an mcmc.list
and not just a list, but whatever. This works, almost 4 times (yeah yeah overhead blah blah) faster than the usual n.chain=4
would be!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.