Running an R-based ABM in parallel on a Multicore Desktop
[This article was first published on EvolvingSpaces, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve been running for the last couple of months a lot of simulations written in R on legion cluster here at UCL. But I still do many things on our quad-core server here at the institute, and parallelising the simulations is extremely handy.
So here’s my recipe:
1)Firstly, the main ingredient, you need an R function for your model where you have a set of parameters to sweep, something like:
Rsim(InitialPopulation=100,K=5000,ReproductiveRate=0.2,TransmissionRate=0.4,MutationRate=0.1)
{
#Put Something amazing and cool here
# …
return(list(pop=pop,fitness=fitness,diversity=diversity))
}
The function can return everything, in my case this will be a list with three vectors, say $pop, $fitness and $diversity.
2)For the actual parallelisation, you need the following two packages: foreach and doMC and their dependencies. After loading both packages you specify the number of cores you want to use, as follows:
library(foreach)
library(doMC)
cores=4
registerDoMC(cores)
3)Now, at this point I create a sweepSpace, which is basically a data.frame with all the combinations of parameters that I want to explore, repeated n times, with n being the number of runs. So in my case I want to sweep the MutationRate, the TransmissionRate and ReproductiveRate for 100 runs.
runs=50
mutationSweep=c(0.0001,0.001,0.01,0.1)
reproductionSweep=c(0.1,0.2,0.3)
transmissionSweep=c(0,0.5,1.0)
sweepSpace=expand.grid(runs=1:runs,m= mutationSweep,r= reproductionSweep,z= transmissionSweep)
4)Now we are ready to run the simulation with the following code
res=foreach(a=sweepSpace$runs,b=sweepSpace$m,c=sweepSpace$r,d=weepSpace$z)%dopar%{Rsim(InitialPopulation=100,K=5000,ReproductiveRate=c,TransmissionRate=d, TransmissionRate =b)}
The foreach() function will read each row of sweepSpace and assign the values of each column to the letters, and then use these for the Rsim() function. The resulting object is a list with a length equal to the number of rows of sweepSpace (thus in this case 500), containing the results of Rsim() in each of them. The parallelisation allows the simultaneous but independent run for all combinations and the storage of the results which follows the order in the sweepSpace. So if you have a specific parameter combination which takes longer to compute, the foreach() function allows to carry on, in the meanwhile, all the other combinations and put the results in the correct place.
5)Get and plot your results!
The sweepSpace can also be used as map to go through the resulting list.
For instance, if you want to create a function which allows the plotting of all the population dynamics on a single plot, you can write something as the following code:
plotSimPop=function(res,sweepSpace, ReproductiveRate, TransmissionRate, TransmissionRate)
{
#First we want to retrieve the index of list having the specific parameters that we’ve requested:
tmp=res[which(sweepSpace$r==ReproductiveRate&sweepSpace$m== MutationRate&sweepSpace$z==TransmissionRate)]
#we then retrieve the number of runs, and create a matrix with our results, with row corresponding to the number of runs, and columns to the number of timesteps (here set to 1000):
nruns=length(tmp)
resMat=matrix(0,nruns,1000)
#We then loop through the results collecting and storing the population dynamics in our matrix
for (x in 1:nruns)
{
resMat[x,]=res[[x]]$pop
}
#And we can now plot the results, with the average dynamics in red:
average=apply(resMat,2,mean,na.rm=TRUE)
plot(resMat[1,],type=”n”,xlim=c(1,1000),ylim=c(min(resMat),max(resMat),xlab=”Time”,ylab=”Population”)
for (x in 1:nruns)
{
lines(resMat[x,],col=”grey”)
}
lines(average,col=”red”)
}
So here’s my recipe:
1)Firstly, the main ingredient, you need an R function for your model where you have a set of parameters to sweep, something like:
Rsim(InitialPopulation=100,K=5000,ReproductiveRate=0.2,TransmissionRate=0.4,MutationRate=0.1)
{
#Put Something amazing and cool here
# …
return(list(pop=pop,fitness=fitness,diversity=diversity))
}
The function can return everything, in my case this will be a list with three vectors, say $pop, $fitness and $diversity.
2)For the actual parallelisation, you need the following two packages: foreach and doMC and their dependencies. After loading both packages you specify the number of cores you want to use, as follows:
library(foreach)
library(doMC)
cores=4
registerDoMC(cores)
3)Now, at this point I create a sweepSpace, which is basically a data.frame with all the combinations of parameters that I want to explore, repeated n times, with n being the number of runs. So in my case I want to sweep the MutationRate, the TransmissionRate and ReproductiveRate for 100 runs.
runs=50
mutationSweep=c(0.0001,0.001,0.01,0.1)
reproductionSweep=c(0.1,0.2,0.3)
transmissionSweep=c(0,0.5,1.0)
sweepSpace=expand.grid(runs=1:runs,m= mutationSweep,r= reproductionSweep,z= transmissionSweep)
4)Now we are ready to run the simulation with the following code
res=foreach(a=sweepSpace$runs,b=sweepSpace$m,c=sweepSpace$r,d=weepSpace$z)%dopar%{Rsim(InitialPopulation=100,K=5000,ReproductiveRate=c,TransmissionRate=d, TransmissionRate =b)}
The foreach() function will read each row of sweepSpace and assign the values of each column to the letters, and then use these for the Rsim() function. The resulting object is a list with a length equal to the number of rows of sweepSpace (thus in this case 500), containing the results of Rsim() in each of them. The parallelisation allows the simultaneous but independent run for all combinations and the storage of the results which follows the order in the sweepSpace. So if you have a specific parameter combination which takes longer to compute, the foreach() function allows to carry on, in the meanwhile, all the other combinations and put the results in the correct place.
5)Get and plot your results!
The sweepSpace can also be used as map to go through the resulting list.
For instance, if you want to create a function which allows the plotting of all the population dynamics on a single plot, you can write something as the following code:
plotSimPop=function(res,sweepSpace, ReproductiveRate, TransmissionRate, TransmissionRate)
{
#First we want to retrieve the index of list having the specific parameters that we’ve requested:
tmp=res[which(sweepSpace$r==ReproductiveRate&sweepSpace$m== MutationRate&sweepSpace$z==TransmissionRate)]
#we then retrieve the number of runs, and create a matrix with our results, with row corresponding to the number of runs, and columns to the number of timesteps (here set to 1000):
nruns=length(tmp)
resMat=matrix(0,nruns,1000)
#We then loop through the results collecting and storing the population dynamics in our matrix
for (x in 1:nruns)
{
resMat[x,]=res[[x]]$pop
}
#And we can now plot the results, with the average dynamics in red:
average=apply(resMat,2,mean,na.rm=TRUE)
plot(resMat[1,],type=”n”,xlim=c(1,1000),ylim=c(min(resMat),max(resMat),xlab=”Time”,ylab=”Population”)
for (x in 1:nruns)
{
lines(resMat[x,],col=”grey”)
}
lines(average,col=”red”)
}
To leave a comment for the author, please follow the link and comment on their blog: EvolvingSpaces.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.