R and MPI on Ohio Supercomputer Center’s Oakley cluster
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A few years ago, I wrote a short guide to Using R and snow on the Ohio Supercomputer Center’s Glenn cluster. Several things have changed in the world of R since then (namely, the inclusion of the parallel
package into the base system) and I have moved to using the Oakley cluster, so I thought it was time to write an update to that older post.
Installing Rmpi
To use parallel
on the cluster, you will need to install Rmpi
. First, login to the Oakley cluster with ssh
and the supplied username and password:
$ ssh [email protected]
Once logged in, you will need to set up R to use Rmpi
. You will also need to install snow
, which is used behind the scenes by parallel
for MPI jobs. You will want to install Rmpi
manually, as there are a few extra parameters that need to be set. The newest version of Rmpi
, as of writing, is 0.6-5
. You can download it using wget
:
$ wget http://cran.r-project.org/src/contrib/Rmpi_0.6-5.tar.gz
With Rmpi
downloaded, you can prepare to install it. Because you have to set some custom flags, the easiest way to do this is to create a short executable script
$ nano compile_Rmpi.sh
This will open nano
, a text editor. Paste the following into the editor (modifying the Rmpi
version for the one you are installing):
#!/bin/sh R CMD INSTALL --configure-vars="CPPFLAGS=-I${MPICH_HOME}/include LDFLAGS=-L${MPICH_HOME}/lib" --configure-args="--with-Rmpi-include=${MPICH_HOME}/include --with-Rmpi-libpath=${MPICH_HOME}/lib --with-Rmpi-type=MPICH2" Rmpi_0.6-5.tar.gz
Once hit ctrl-X
, Y
, then enter to save the file. Now make it executable:
$ chmod +x compile_Rmpi.sh
Now you will need to create a .R/Makevars
file indicating which compiler to use.
$ mkdir .R $ nano .R/Makevars
In the Makevars
file, enter
CC=mpicc SHLIB_LD=mpicc
Save and exit. Now create the .Renviron
file that will direct R to create a local R library repository.
$ echo 'R_LIBS_USER="~/R/library"' > .Renviron $ mkdir -p R/library
Now you are ready to load the R module and compile Rmpi
.
$ module load R $ ./compile_Rmpi.sh
Now that Rmpi
is installed, you start R and install snow
and rlecuyer
as you would normally.
$ R R> install.packages(c("snow", "rlecuyer"))
Making bash more pleasant
As an unrelated aside for those new to bash
and the command line: If you want a better experience using bash
(the shell that is used by default), you can create a file called .bashrc
and enter the following (updating the username in the export PATH
line to reflect your username) to get color highlighting and a meaningful prompt:
$ nano .bashrc ### ------------------------------------------------------------------------ ### default environment variables ### ------------------------------------------------------------------------ export PATH=.:$PATH:/home/osuXXXX/bin # UTF-8 export LANG="en_US.UTF-8" export LC_CTYPE="en_US.UTF-8" export LC_ALL="en_US.UTF-8" export MM_CHARSET=UTF-8 ### ------------------------------------------------------------------------ ### aliases ### ------------------------------------------------------------------------ alias ls="ls --color=auto" alias ll="ls -l -A --color=auto -h" ### ------------------------------------------------------------------------ ### ui ### ------------------------------------------------------------------------ # username colors, green for non-root, red for root USER_COLOR='1;31m' if [ ${UID} -eq 0 ]; then USER_COLOR='1;31m' fi # components USER_NAME='[33[${USER_COLOR}]u[33[0m]' DIR_NAME='[33[1m]w[33[0m]' MACHINE='[33[1;32m]h[33[0m]' AT_SYM='[33[0;34m]@[33[0m]' # prompt export PS1="${TITLEBAR}${USER_NAME}${AT_SYM}${MACHINE}:${DIR_NAME} > "
And create a .bash_profile
file to make sure .bashrc
is run at login:
$ nano .bash_profile if [ -f ~/.bashrc ]; then . ~/.bashrc fi
Using MPI with the Cluster
Create a directory for the test files.
$ mkdir test
Now, change to that directory and create the batch job submission script as well as the R script.
$ cd test $ nano test.job #PBS -l walltime=00:01:00 #PBS -l nodes=1:ppn=12 #PBS -N test #PBS -S /bin/bash #PBS -j oe #PBS -m abe #PBS -M [email protected] #PBS -A PAA0014 set echo export TEST=${HOME}/test pbsdcp -r ${TEST}/test.R $TMPDIR cd $TMPDIR module load R mpiexec ~/R/library/snow/RMPISNOW < test.R pbsdcp -g -r result.RData ${TEST}/ exit
Now create the R script, test.R
:
$ nano test.R library(Rmpi) library(parallel) fn <- function(n) { sample(1:10, n, replace=TRUE) } cl <- makeCluster(type="MPI") clusterSetupRNG(cl) result <- parLapply(cl, 1:12, fn) save(result, file="result.RData")
Submit your job using qsub
:
$ qsub test.job 3542961.oak-batch.osc.edu
The prefix, 3542961
, is a unique job number and will be different each time you submit a job. To check the status of the job, as well as any others you may have submitted, use qstat -u
:
$ qstat -u oak-batch.osc.edu:15001: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- ----------- -------- -------- ------ ----- ------ ------ ----- - ----- 3542961.oak-batc osu6738 serial test 13555 1 12 48gb 00:01 R --
When the job has run, the result.RData
file will be in the test directory. Open
it with R to verify that everything worked.
$ cd ~/test $ module load R $ R R> load("result.RData") R> result [[1]] [1] 2 [[2]] [1] 8 10 [[3]] [1] 8 10 10 [[4]] [1] 1 7 3 9 [[5]] [1] 10 4 8 2 1 [[6]] [1] 4 2 7 3 1 5 [[7]] [1] 5 5 1 4 6 1 3 [[8]] [1] 10 3 7 8 3 2 7 2 [[9]] [1] 9 7 6 9 1 10 9 8 8 [[10]] [1] 2 1 7 8 7 1 9 10 9 9 [[11]] [1] 3 4 3 1 2 2 10 6 9 9 2 [[12]] [1] 10 2 5 4 1 9 7 9 10 7 5 8
If anything went wrong, details will be contained in the output file, test.o3542961
. The file name will change for each job. You can view the file using less
.
$ less test.o3542961
Press q
to exit. Much more detail on how to create job scripts and submit them
can be found on the OSC site: OSC batch processing.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.