Large correlation in parallel

[This article was first published on Brain Chronicle, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A little improvement to the bigcor function proposed on Rmazing to compute huge correlation matrix in R, I made the function work in parallel using all the CPU cores available on the machine. The code is here.

Here is a benchmark of the 2 functions on my machine with 8 cores:


R <- c(2000, 5000, 10000, 20000, 40000)
## I hit the limit at ~50000 the ff function refuse to create the matrix.
# Error in if (length < 0 || length > .Machine$integer.max) stop("length must be between 1 and .Machine$integer.max") :
# missing value where TRUE/FALSE needed
# http://www.bytemining.com/2010/05/hitting-the-big-data-ceiling-in-r/
normal <- numeric(length=length(R))
for(i in 1:length(R)){
split <- ifelse(R[i]<=20000, 10, 20)
MAT <- matrix(rnorm(R[i] * 10), nrow = 10)
normal[i] <- system.time(res <- bigcor(MAT, nblocks = split, verbose=FALSE))[3]
}
parallel <- numeric(length=length(R))
for(i in 1:length(R)){
split <- ifelse(R[i]<=20000, 10, 20)
MAT <- matrix(rnorm(R[i] * 10), nrow = 10)
parallel[i] <- system.time(res <- bigcorPar(MAT, nblocks = split, verbose=FALSE))[3]
}
d <- data.frame(time=c(normal, parallel), type=rep(c("normal", "parallel"), each=length(R)), size=rep(R, 2))
library(ggplot2)
pdf("bigcor_benchmark.pdf", height=7, width=7)
qplot(size, time, data=d, group=type, colour=type, geom=c("point","path"),
xlab="Matrix size", ylab="Time in sec.",
main="Speed comparison bigcor / bigcorPar")
dev.off()
view raw benchmark.r hosted with ❤ by GitHub

To leave a comment for the author, please follow the link and comment on their blog: Brain Chronicle.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)