Choosing a new default RNG for dqrng
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Currently the dqrng package supports only xoroshiro128+ and xoshiro256+ from https://prng.di.unimi.it/ (see also Blackman and Vigna 2021). These RNGs should only be used for creating floating point numbers, which was the case for dqrng originally. However, dqsample
and dqrrademacher
make use of the full bit pattern. So it would be better to support the **
and/or ++
variants for both RNGs and make one of them the default. This would be a breaking change, of course. In #57 I have added these additional 4 RNGs to xoshiro.h
so now is the time to do some benchmarking first by generating some random numbers:
#include <Rcpp.h> // [[Rcpp::depends(dqrng, BH)]] #include <dqrng_distribution.h> #include <xoshiro.h> // [[Rcpp::plugins(cpp11)]] template<typename RNG> double sum_rng(int n) { auto rng = dqrng::generator<RNG>(42); dqrng::uniform_distribution dist; double result = 0.0; for (int i = 0; i < n; ++i) { result += dist(*rng); } return result; } // [[Rcpp::export]] double sum_128plus(int n) { return sum_rng<dqrng::xoroshiro128plus>(n); } // [[Rcpp::export]] double sum_256plus(int n) { return sum_rng<dqrng::xoshiro256plus>(n); } // [[Rcpp::export]] double sum_128starstar(int n) { return sum_rng<dqrng::xoroshiro128starstar>(n); } // [[Rcpp::export]] double sum_256starstar(int n) { return sum_rng<dqrng::xoshiro256starstar>(n); } // [[Rcpp::export]] double sum_128plusplus(int n) { return sum_rng<dqrng::xoroshiro128plusplus>(n); } // [[Rcpp::export]] double sum_256plusplus(int n) { return sum_rng<dqrng::xoshiro256plusplus>(n); } N <- 1e5 bm <- bench::mark( sum_128plus(N), sum_128starstar(N), sum_128plusplus(N), sum_256plus(N), sum_256starstar(N), sum_256plusplus(N), check = FALSE, min_time = 1 ) knitr::kable(bm[, 1:6])
expression | min | median | itr/sec | mem_alloc | gc/sec |
---|---|---|---|---|---|
sum_128plus(N) | 314µs | 318µs | 2989.530 | 2.49KB | 0 |
sum_128starstar(N) | 340µs | 342µs | 2817.644 | 2.49KB | 0 |
sum_128plusplus(N) | 343µs | 347µs | 2784.206 | 2.49KB | 0 |
sum_256plus(N) | 348µs | 351µs | 2731.439 | 2.49KB | 0 |
sum_256starstar(N) | 348µs | 351µs | 2736.009 | 2.49KB | 0 |
sum_256plusplus(N) | 358µs | 360µs | 2650.948 | 2.49KB | 0 |
plot(bm)
The current default xoroshiro128+ is fastest in this comparison with the other generators being very similar. Let’s try a more realistic usecase: generating many uniformaly distributed random numbers:
#include <Rcpp.h> // [[Rcpp::depends(dqrng, BH)]] #include <dqrng_distribution.h> #include <xoshiro.h> // [[Rcpp::plugins(cpp11)]] template<typename RNG> Rcpp::NumericVector runif_rng(int n) { auto rng = dqrng::generator<RNG>(42); dqrng::uniform_distribution dist; Rcpp::NumericVector result(Rcpp::no_init(n)); std::generate(result.begin(), result.end(), [rng, dist] () {return dist(*rng);}); return result; } // [[Rcpp::export]] Rcpp::NumericVector runif_128plus(int n) { return runif_rng<dqrng::xoroshiro128plus>(n); } // [[Rcpp::export]] Rcpp::NumericVector runif_256plus(int n) { return runif_rng<dqrng::xoshiro256plus>(n); } // [[Rcpp::export]] Rcpp::NumericVector runif_128starstar(int n) { return runif_rng<dqrng::xoroshiro128starstar>(n); } // [[Rcpp::export]] Rcpp::NumericVector runif_256starstar(int n) { return runif_rng<dqrng::xoshiro256starstar>(n); } // [[Rcpp::export]] Rcpp::NumericVector runif_128plusplus(int n) { return runif_rng<dqrng::xoroshiro128plusplus>(n); } // [[Rcpp::export]] Rcpp::NumericVector runif_256plusplus(int n) { return runif_rng<dqrng::xoshiro256plusplus>(n); } N <- 1e5 bm <- bench::mark( runif(N), runif_128plus(N), runif_128starstar(N), runif_128plusplus(N), runif_256plus(N), runif_256starstar(N), runif_256plusplus(N), check = FALSE, min_time = 1 ) knitr::kable(bm[, 1:6])
expression | min | median | itr/sec | mem_alloc | gc/sec |
---|---|---|---|---|---|
runif(N) | 2.95ms | 3.32ms | 295.9138 | 786KB | 5.228159 |
runif_128plus(N) | 336.28µs | 423.06µs | 1914.5290 | 784KB | 38.338503 |
runif_128starstar(N) | 346.3µs | 452.96µs | 1857.5411 | 784KB | 36.422376 |
runif_128plusplus(N) | 346.99µs | 444.55µs | 1818.4020 | 784KB | 30.229189 |
runif_256plus(N) | 343.13µs | 417.64µs | 1902.2133 | 784KB | 31.278781 |
runif_256starstar(N) | 348.68µs | 378.25µs | 2203.4952 | 784KB | 37.466444 |
runif_256plusplus(N) | 356.95µs | 446.44µs | 1832.3717 | 784KB | 31.572455 |
plot(bm)
Here all six generators are very similar, with all of them clearly faster than R’s built in runif
. How about sampling with replacement, which is also mostly governed by the speed of generating random numbers:
#include <Rcpp.h> // [[Rcpp::depends(dqrng, BH)]] #include <dqrng_sample.h> #include <xoshiro.h> // [[Rcpp::plugins(cpp11)]] // [[Rcpp::export]] Rcpp::IntegerVector sample_128plus(int m, int n) { auto rng = dqrng::generator<dqrng::xoroshiro128plus>(42); return dqrng::sample::sample<INTSXP, uint32_t>(rng, uint32_t(m), uint32_t(n), true, 0); } // [[Rcpp::export]] Rcpp::IntegerVector sample_128starstar(int m, int n) { auto rng = dqrng::generator<dqrng::xoroshiro128starstar>(42); return dqrng::sample::sample<INTSXP, uint32_t>(rng, uint32_t(m), uint32_t(n), true, 0); } // [[Rcpp::export]] Rcpp::IntegerVector sample_128plusplus(int m, int n) { auto rng = dqrng::generator<dqrng::xoroshiro128plusplus>(42); return dqrng::sample::sample<INTSXP, uint32_t>(rng, uint32_t(m), uint32_t(n), true, 0); } // [[Rcpp::export]] Rcpp::IntegerVector sample_256plus(int m, int n) { auto rng = dqrng::generator<dqrng::xoshiro256plus>(42); return dqrng::sample::sample<INTSXP, uint32_t>(rng, uint32_t(m), uint32_t(n), true, 0); } // [[Rcpp::export]] Rcpp::IntegerVector sample_256starstar(int m, int n) { auto rng = dqrng::generator<dqrng::xoshiro256starstar>(42); return dqrng::sample::sample<INTSXP, uint32_t>(rng, uint32_t(m), uint32_t(n), true, 0); } // [[Rcpp::export]] Rcpp::IntegerVector sample_256plusplus(int m, int n) { auto rng = dqrng::generator<dqrng::xoshiro256plusplus>(42); return dqrng::sample::sample<INTSXP, uint32_t>(rng, uint32_t(m), uint32_t(n), true, 0); } N <- 1e5 M <- 1e3 bm <- bench::mark( sample.int(M, N, replace = TRUE), sample_128plus(M, N), sample_128starstar(M, N), sample_128plusplus(M, N), sample_256plus(M, N), sample_256starstar(M, N), sample_256plusplus(M, N), check = FALSE, min_time = 1 ) knitr::kable(bm[, 1:6])
expression | min | median | itr/sec | mem_alloc | gc/sec |
---|---|---|---|---|---|
sample.int(M, N, replace = TRUE) | 3.22ms | 3.45ms | 286.6898 | 401KB | 2.033261 |
sample_128plus(M, N) | 350.22µs | 417.51µs | 2165.7012 | 393KB | 20.307182 |
sample_128starstar(M, N) | 402.46µs | 488.56µs | 1873.1944 | 393KB | 15.019886 |
sample_128plusplus(M, N) | 350.28µs | 380.14µs | 2288.3859 | 393KB | 18.472251 |
sample_256plus(M, N) | 367.89µs | 392.56µs | 2221.3853 | 393KB | 18.484361 |
sample_256starstar(M, N) | 367.38µs | 414.71µs | 2127.8934 | 393KB | 17.388302 |
sample_256plusplus(M, N) | 419.19µs | 493.82µs | 1832.8341 | 393KB | 15.023230 |
plot(bm)
Again nothing really conclusive. All six RNGs are similar and much faster than R’s build in sample.int
.
The speed comparisons between these generators is inconclusive to me. The xoroshiro128 seem to be slightly faster than the xoshiro256 variants. So I am leaning towards one of those as the new default while still making the corresponding xoshiro256 variant available in case a longer period or a higher degree of parallelisation is needed. Comparing the ++
and **
variants, I am leaning towards ++
, but that is not set in stone.
References
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.