Using R and snow on Ohio Supercomputer Center’s Glenn cluster
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Over the last several days, I have had the “pleasure” of getting parallel processing with R running on the the Ohio Supercomputer Center’s (OSC) Glenn cluster. I am working on a project that uses GenMatch
from Sekhon’s Matching
, which uses the snow
library to manage parallel processing. Getting snow
to run properly on single machines, or ever with a cluster of machines via ssh
connections is fairly trivial. But using it on the OSC cluster turned out to be a bit more difficult. Well, difficult in relative terms. Once you know the steps to take, it’s not all that bad. While I am still not completely sure I’ve done everything correctly, I thought I would post this short guide in hopes that it could save someone else a few days of headaches. I’ll update the post if I discover something is incorrect.
Step 1: Compile Rmpi
In order to utilize more than one node on the Glenn cluster, you need to have Rmpi
installed and, importantly, linked to the appropriate MPI libraries provided by OSC. To do so, you first need to create a .R/Makevars
file in your home directory that will instruct R to use mpicc
instead of gcc
to compile the Rmpi
library.
$ mkdir ~/.R $ nano ~/.R/Makevars
And this is what you should place in Makevars
:
CC=mpicc SHLIB_LD=mpicc
Next, you will need to swap out the mpi
module, the default, and replace it with an alternative. If the R module isn’t yet loaded, you will need to do that as well.
$ module swap mpi mvapich2-1.0.2p1-gnu $ module load R-2.8.0
If you aren’t sure which version of MPI you should load, you can use the module avail
command to see what’s available. Or, better yet, you could email the excellent support staff at OSC. Note that I was not able to get Rmpi
to install correctly with R-2.11.1. Since I had 2.8 working, I didn’t do much further investigation.
Now it’s time to compile and install Rmpi
. Download the most recent version and place it in your working directory. You can either do that through your browser or with wget
; e.g.,
$ wget http://cran.r-project.org/src/contrib/Rmpi_0.5-9.tar.gz
Just be sure to replace the Rmpi
package version above with the most recent. After doing so, the following command should correctly install the package.
$ R CMD INSTALL --configure-vars="CPPFLAGS=-I${MPICH_HOME}/include LDFLAGS=-L${MPICH_HOME}/lib" \ --configure-args="--with-Rmpi-include=${MPICH_HOME}/include --with-Rmpi-libpath=${MPICH_HOME}/lib --with-Rmpi-type=MPICH2" \ Rmpi_0.5-9.tar.gz
Note that the command above should only have line breaks immediately after the \
. In other words, the whole command is three lines in length, each one ended at \
, which marks a continuation.
Step 2: Setting up your PBS job script
Successfully processing a job across multiple nodes with R and snow
requires some small changes to your PBS script. If you aren’t yet familiar with PBS scripts, a good place to start is here and here. First, you should create a directory to hold all of the files associated with your batch job. Here I create one called Test
in my home directory:
$ mkdir ~/Test
Now create a PBS script file.
$ nano SnowTest.job
And add something like this:
#PBS -l walltime=00:10:00 #PBS -l nodes=2:ppn=8 #PBS -N SnowTest #PBS -S /bin/bash #PBS -j oe #PBS -m abe #PBS -M [email protected] set echo export TEST=${HOME}/Test pbsdcp -r ${TEST}/* $TMPDIR cd $TMPDIR module swap mpi mvapich2-1.0.2p1-gnu module load R-2.8.0 mpiexec -n 16 RMPISNOW < SnowTest.R pbsdcp -g -r '*' ${TEST}/ exit
This will run whatever you put in SnowTest.R
across 16 cores on two nodes for 10 minutes. To make sure everything is working, put something like the following in SnowTest.R
.
# Test snow on the OSC cluster. # First, get the cluster info. cl <- makeCluster() # Now generate some random variables on all of the clusters. Note, # because we haven't set a different seed for all of the processes, you # may get back duplicates. GenMatch and rgenoud takes care of this # for you, but other R libraries may not. See the snow documentation # for more details. print(clusterCall(cl, rnorm(1000)))
Now you can submit the job as you normally would:
$ qsub SnowText.job
When the results come back, if everything worked you should see a list of 16000 random numbers in your log. If you want to verify that a longer-running job is using all of the cores and nodes you requested, you can use qstat -f
and compare Wall and CPU time. For example, for a job I have running right now, when I run qstat -f
, I get:
$ qstat -f 1234567 Job Id: 1234567.opt-batch.osc.edu Job_Name = Job-20110326-1 Job_Owner = [email protected] resources_used.cput = 411:42:28 resources_used.mem = 2998464kb resources_used.vmem = 6312984kb resources_used.walltime = 26:14:35 [... snip ...]
Above, cput / walltime
is approximately 16, which indicates that I am using all of the processors I requested.
Anyway, hopefully someone find this useful. And please let me know if you see any fatal errors in the above steps.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.