Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In native R, the user sets the seed for random number generation (RNG) with set.seed()
. Random number generators exist in C and C++ too; these need their own seeds, which are not obviously settable by set.seed()
. Good news! It can be done.
pacman::p_load(inline, purrr)
rbernoulli
Base R (or technically the stats
package) provides no rbernoulli()
. It’s a pretty gaping hole in the pantheon of rbeta()
, rbinom()
, rcauchy
, rchisq()
, rexp()
, rf()
, rgamma()
, etc. Thankfully, Hadley Wickham noticed this and gave us purrr::rbernoulli()
.
set.seed(1) rbernoulli(5, 0.7) #> [1] FALSE TRUE TRUE TRUE FALSE set.seed(1) rbernoulli(5, 0.7) #> [1] FALSE TRUE TRUE TRUE FALSE
So it seems like Hadley managed to get set.seed()
to work with rbernoulli()
. How did he do this? Let’s take a closer look at purrr::rbernoulli()
.
purrr::rbernoulli #> function (n, p = 0.5) #> { #> stats::runif(n) > (1 - p) #> } #> <bytecode: 0x7fb23720c540> #> <environment: namespace:purrr>
Ah, it seems Hadley just wrapped runif()
; hence, because set.seed()
works with runif()
, it works with his implementation of purrr::rbernoulli()
.
C++ RNG
The C++ standard library provides the <random>
header file, which includes Bernoulli RNG. Let’s give that a whirl.
cpp_rbernoulli <- rcpp(c(n = "integer", p = "numeric", seed = "integer"), ' int n_ = as<int>(n), seed_ = as<int>(seed); double p_ = as<double>(p); std::default_random_engine generator(seed_); std::bernoulli_distribution distribution(p_); IntegerVector out(n_); for (std::size_t i = 0; i != n_; ++i) { out[i] = distribution(generator); } return out; ', includes = "#include <random>") cpp_rbernoulli(6, 0.7, seed = 1) #> [1] 1 0 1 1 0 1 cpp_rbernoulli(6, 0.7, seed = 1) #> [1] 1 0 1 1 0 1 cpp_rbernoulli(6, 0.7, seed = 2) #> [1] 1 0 1 0 1 1
OK, so now we have cpp_rbernoulli()
which is working, but the user has to pass the seed as an argument of the function, there’s no option to use set.seed()
.
get_seed()
If only there was a get_seed()
function that we could use. Well, here it is!
get_seed <- function() { sample.int(.Machine$integer.max, 1) }
This gets a positive number in the unsigned 32-bit integer range (which is always a safe bet for a seed) and it is completely determined by set.seed()
. Therefore, it’s fine to use as a seed itself. Let’s take a look.
set.seed(1) replicate(6, get_seed()) #> [1] 570175513 799129990 1230193230 1950361378 433108649 1929277158 set.seed(1) replicate(6, get_seed()) #> [1] 570175513 799129990 1230193230 1950361378 433108649 1929277158 set.seed(2) replicate(6, get_seed()) #> [1] 397031630 1508336757 1231208929 360888751 2026879546 2026097046 set.seed(2) replicate(6, get_seed()) #> [1] 397031630 1508336757 1231208929 360888751 2026879546 2026097046
So as we can see, setting a seed via set.seed()
determines the seeds that subsequently come out of get_seed()
, so all is well with the world. get_seed()
can now be used to create a version of cpp_rbernoulli()
which uses set.seed()
. For the sake of inflating my own ego, I’ll name this version after myself.
rorybernoulli <- function(n, p) { cpp_rbernoulli(n, p, get_seed()) }
Let’s check that it’s in working order.
set.seed(1) rorybernoulli(6, 0.7) #> [1] 1 1 0 0 1 0 set.seed(1) rorybernoulli(6, 0.7) #> [1] 1 1 0 0 1 0 set.seed(2) rorybernoulli(6, 0.7) #> [1] 0 1 1 1 1 1 set.seed(2) rorybernoulli(6, 0.7) #> [1] 0 1 1 1 1 1
Everything is awesome.
Benchmarking
Lastly, let’s compare the two Bernoulli RNGs that we have now by asking them both to give us a million Bernoulli random numbers with p = 0.5
.
bench::mark(purrr::rbernoulli(1e6, p = 0.5), rorybernoulli(1e6, p = 0.5), check = FALSE) #> # A tibble: 2 x 10 #> expression min mean median max `itr/sec` mem_alloc n_gc n_itr #> <chr> <bch:t> <bch:t> <bch:t> <bch:> <dbl> <bch:byt> <dbl> <int> #> 1 purrr::rb… 26.9ms 29.45ms 28.57ms 32.7ms 34.0 11.45MB 4 13 #> 2 roryberno… 8.55ms 9.61ms 9.38ms 12.7ms 104. 3.82MB 4 46 #> # ... with 1 more variable: total_time <bch:tm>
Wow, rorybernoulli()
is three times faster! I wasn’t expecting that. Perhaps it’s because there’s a quicker way of generating a Bernoulli random number than by going through a uniform random number (as purrr::rbernoulli()
does). It’s also three times as efficient with memory, probably related to the time speedup. The point of me writing this post was to share this get_seed()
thing with people so that the can use set.seed()
with Rcpp
and the like; purrr::rbernoulli()
was just a cool example of a non-base RNG that popped into my head. Maybe I should submit a pull request to purrr
!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.