How to set library path on a {parallel} R cluster
[This article was first published on R – Mark van der Loo, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In R you can add extra library locations (directories where your packages are installed) with the .libPaths() function. For example, to add "~/my/lib", you can do
libs <- c("~/my/lib", .libPaths())
libPaths(new = libs)
If you want to set library locations for all workers in a cluster using the parallel package, the intuitive way of doing this is as follows.
libs <- c("~/my/lib", .libPaths())
cluster <- parallel::makeCluster(2)
clusterCall(cluster, .libPaths, new=libs)
However, this does not work. I have not spent any time figuring out why, but presumably the side effect caused by .libPaths() is sent to the wrong place. Here are the internals of .libPaths().
> .libPaths
function (new)
{
if (!missing(new)) {
new <- Sys.glob(path.expand(new))
paths <- c(new, .Library.site, .Library)
paths <- paths[dir.exists(paths)]
.lib.loc <<- unique(normalizePath(paths, "/"))
}
else .lib.loc
}
The side effect is where .lib.loc is altered.
In any case, the following approach does work. We export the libs variable to the workers and then set libPaths() using clusterEvalQ().
e <- new.env()
e$libs <- c("~/my/lib", .libPaths())
cluster <- makeCluster(2)
clusterExport(cluster, "libs", envir=e)
clusterEvalQ(cluster, .libPaths(libs))
To leave a comment for the author, please follow the link and comment on their blog: R – Mark van der Loo.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.