Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Recently, I have been doing some analysis for a project I am involved in. In particular, I was interested what role pacific sea surface temperatures play with regard to rainfall in East Africa. I spare you the details as I am currently writing all this up into a paper which you can have a look at once published.
For this analysis, however, I am processing quite an amount of raster files. This led me to investigate the possibilities of the parallel
package to speed up the process.
Here's a quick example on how to read in raster data (in this case 460 global sea surface temperature files of 1° x 1° degree resolution) using parallel
First, lets do it the conventional way and see how long that takes
library(raster) library(rgdal) ### Input preparation ######################################################## inputpath <- "/media/tims_ex/sst_kili_analysis" ptrn <- "*sst_anom_pcadenoise_*_R2.rst" ### list files in direcotry ################################################## fnames_sst_r2 <- list.files(inputpath, pattern = glob2rx(ptrn), recursive = T) ### read into raster format ################################################## system.time({ sst.global <- lapply(seq(fnames_sst_r2), function(i) { raster(paste(inputpath, fnames_sst_r2[i], sep = "/")) } ) }) ## user system elapsed ## 61.584 0.412 68.104
Now using library(parallel)
library(parallel) system.time({ ### set up cluster call ###################################################### cl <- makePSOCKcluster(4) clusterExport(cl, varlist = c("inputpath", "fnames_sst_r2"), envir=environment()) junk <- clusterEvalQ(cl, c(library(raster), library(rgdal))) ### read into raster format using parallel version of lapply ################# sst.global.p <- parLapply(cl, seq(fnames_sst_r2), function(i) { raster(paste(inputpath, fnames_sst_r2[i], sep = "/")) } ) ### stop the cluster ######################################################### stopCluster(cl) }) ## user system elapsed ## 0.152 0.080 25.670
Not a crazy speed enhancement, but we need to keep in mind that the raster
command does not read into memory. Hence, the speed improvements should be a lot higher once we start the calculations or plotting.
Finally, let's test whether the two methods produce identical results.
identical(sst.global.p, sst.global) ## [1] TRUE
to be continued…
sessionInfo() ## R version 2.15.3 (2013-03-01) ## Platform: x86_64-pc-linux-gnu (64-bit) ## ## locale: ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C ## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 ## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 ## [7] LC_PAPER=C LC_NAME=C ## [9] LC_ADDRESS=C LC_TELEPHONE=C ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C ## ## attached base packages: ## [1] parallel stats graphics grDevices utils datasets methods ## [8] base ## ## other attached packages: ## [1] rgdal_0.8-5 raster_2.0-41 sp_1.0-5 knitr_1.1 ## ## loaded via a namespace (and not attached): ## [1] digest_0.6.3 evaluate_0.4.3 formatR_0.7 grid_2.15.3 ## [5] lattice_0.20-13 stringr_0.6.2 tools_2.15.3
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.