Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Today Scott Chamberlain tweeted asking for a better/faster solution to building an m x n matrix with randomly assigned 0/1. He already had a working version:
r <- 1000 c <- 1000 m0 <- matrix(0, r, c) apply(m0, c(1,2), function(x) sample(c(0,1),1))
Now, I’m the first to acknowledge that I’ve never got the ‘apply’ family of functions and that—thanks Hadley—if I need to do something like that I go for the plyr package. Nevertheless, if you think about it, the starting point is a sugary version of a loop; I like loops because they are explicit and clear, but they can be very slow in R. The loop could be written as:
m00 <- matrix(0, r, c) for(i in 1:r){ for(j in 1:c){ m00[i, j] <- sample(c(0,1),1) } }
In contrast, my first idea was to generate a bunch of uniformly distributed [0, 1) random numbers and round them to the closest integer, which is a more ‘matricy’ way of thinking:
m1 <- round(matrix(runif(r*c), r, c)))
and it happens to be a lot faster. My second idea, and Edmund Hart beat me to that one, was to use something like:
m2 <- matrix(rbinom(r*c,1,0.5),r,c)
which is a nice option, generating r*c random numbers following a binomial distribution, which also has the advantage of allowing for different probabilities, rather than 0.5 as in m1. This is also faster than m1, although probably one would not notice the difference until working with fairly big matrices. When Scott pointed out the speed bump he got me thinking about order of the operations; would this be better?
m3 <- matrix(round(runif(r*c)), r, c)
In terms of speed, m3 fits between m1 and m2, so the order can make a difference.
David Smith and Rafael Maia came up with a different approach, using sample() which I had not considered at all. m4 has the advantage that one could use any number of values to randomize.
m4 <- matrix(sample(0:1,r*c, replace=TRUE),r,c)
Any of these options can be timed using the system.time() function as in system.time(m3 < - matrix(round(runif(r*c)), r, c))
. It's interesting how different people come up with alternative strategies (reflecting different ways of thinking about the problem) using exactly the same language. I'm not sure what should be the 'obvious' solution; but for almost any purpose any of them (with the exception of the explicit loop) will work just fine.
P.S. 2012-08-29, 22:33. Another way that would work (although 4 times slower than m1) would be m5 < - matrix(ifelse(runif(r*c) < 0.5, 0, 1))
. Still, it takes less than 0.5 seconds for a 1,000,000 elements matrix.
P.S. 2012-09-04. Scott Chamberlain wrote his own blog post discussing the different suggestions. Even cooler, Dirk Eddelbuettel implemented even faster creation of random 1/0 matrices using Rcpp, with inline C++ code. Much faster in a crazy overkill sort of way.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.