Variable probability Bernoulli outcomes – Fast and Slow
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I am working on a project that requires the generation of Bernoulli outcomes. Typically, I would go about this using the built in sample() function like so:
sample(1:0,n,prob=c(p,1-p),replace=TRUE)
This works great and is fast, even for large n. Problem is, I want to generate each sample with its own unique probability. Seems straight forward enough, I just wrapped the function and vectorized to allow the passing of a vector of p.
binomial_sampler<-function(p){ return(sample(1:0,1,prob=c(p,1-p))) } bs<-Vectorize(binomial_sampler)
Naming this function bs() turned out to be rather prophetic. Nevertheless, I can call this function by passing my unique vector of outcome probabilities. And indeed I get the result I’m looking for.
bs(my_p_vec)
Problem is, this turns out to be very slow. It would seem that there is quite a bit of overhead to calling sample() for one sample at a time. R’s RNGs are very fast for generating many iid samples, so I started thinking like my old c++ programming self and tried a different approach.
Nbs<-function(p) { U<-runif(length(p),0,1) outcomes<-U<p return(U) }
I call the new version Nbs for “New Bernoulli Sampler”, or “Not Bull Shit”. And what a difference it made indeed!
library(rbenchmark) p<-runif(1000) res <- benchmark(bs(p), Nbs(p)) print(res) test replications elapsed relative user.self sys.self user.child sys.child 2 Nbs(p) 100 0.007 1 0.008 0.000 0 0 1 bs(p) 100 1.099 157 1.080 0.016 0 0
157x faster! Now that’s a speedup to write home about.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.