Site icon R-bloggers

Choosing a new default RNG for dqrng

[This article was first published on R on Ralf Stubner, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Currently the dqrng package supports only xoroshiro128+ and xoshiro256+ from https://prng.di.unimi.it/ (see also Blackman and Vigna 2021). These RNGs should only be used for creating floating point numbers, which was the case for dqrng originally. However, dqsample and dqrrademacher make use of the full bit pattern. So it would be better to support the ** and/or ++ variants for both RNGs and make one of them the default. This would be a breaking change, of course. In #57 I have added these additional 4 RNGs to xoshiro.h so now is the time to do some benchmarking first by generating some random numbers:

#include <Rcpp.h>
// [[Rcpp::depends(dqrng, BH)]]
#include <dqrng_distribution.h>
#include <xoshiro.h>
// [[Rcpp::plugins(cpp11)]]

template<typename RNG>
double sum_rng(int n) {
  auto rng = dqrng::generator<RNG>(42);
  dqrng::uniform_distribution dist;
  double result = 0.0;
  for (int i = 0; i < n; ++i) {
    result += dist(*rng);
  }
  return result;
}

// [[Rcpp::export]]
double sum_128plus(int n) {
  return sum_rng<dqrng::xoroshiro128plus>(n);
}
// [[Rcpp::export]]
double sum_256plus(int n) {
  return sum_rng<dqrng::xoshiro256plus>(n);
}
// [[Rcpp::export]]
double sum_128starstar(int n) {
  return sum_rng<dqrng::xoroshiro128starstar>(n);
}
// [[Rcpp::export]]
double sum_256starstar(int n) {
  return sum_rng<dqrng::xoshiro256starstar>(n);
}
// [[Rcpp::export]]
double sum_128plusplus(int n) {
  return sum_rng<dqrng::xoroshiro128plusplus>(n);
}
// [[Rcpp::export]]
double sum_256plusplus(int n) {
  return sum_rng<dqrng::xoshiro256plusplus>(n);
}
N <- 1e5
bm <- bench::mark(
  sum_128plus(N),
  sum_128starstar(N),
  sum_128plusplus(N),
  sum_256plus(N),
  sum_256starstar(N),
  sum_256plusplus(N),
  check = FALSE,
  min_time = 1
)
knitr::kable(bm[, 1:6])
expression min median itr/sec mem_alloc gc/sec
sum_128plus(N) 314µs 318µs 2989.530 2.49KB 0
sum_128starstar(N) 340µs 342µs 2817.644 2.49KB 0
sum_128plusplus(N) 343µs 347µs 2784.206 2.49KB 0
sum_256plus(N) 348µs 351µs 2731.439 2.49KB 0
sum_256starstar(N) 348µs 351µs 2736.009 2.49KB 0
sum_256plusplus(N) 358µs 360µs 2650.948 2.49KB 0
plot(bm)

The current default xoroshiro128+ is fastest in this comparison with the other generators being very similar. Let’s try a more realistic usecase: generating many uniformaly distributed random numbers:

#include <Rcpp.h>
// [[Rcpp::depends(dqrng, BH)]]
#include <dqrng_distribution.h>
#include <xoshiro.h>
// [[Rcpp::plugins(cpp11)]]

template<typename RNG>
Rcpp::NumericVector runif_rng(int n) {
  auto rng = dqrng::generator<RNG>(42);
  dqrng::uniform_distribution dist;
  Rcpp::NumericVector result(Rcpp::no_init(n));
  std::generate(result.begin(), result.end(), [rng, dist] () {return dist(*rng);});
  return result;
}

// [[Rcpp::export]]
Rcpp::NumericVector runif_128plus(int n) {
  return runif_rng<dqrng::xoroshiro128plus>(n);
}
// [[Rcpp::export]]
Rcpp::NumericVector runif_256plus(int n) {
  return runif_rng<dqrng::xoshiro256plus>(n);
}
// [[Rcpp::export]]
Rcpp::NumericVector runif_128starstar(int n) {
  return runif_rng<dqrng::xoroshiro128starstar>(n);
}
// [[Rcpp::export]]
Rcpp::NumericVector runif_256starstar(int n) {
  return runif_rng<dqrng::xoshiro256starstar>(n);
}
// [[Rcpp::export]]
Rcpp::NumericVector runif_128plusplus(int n) {
  return runif_rng<dqrng::xoroshiro128plusplus>(n);
}
// [[Rcpp::export]]
Rcpp::NumericVector runif_256plusplus(int n) {
  return runif_rng<dqrng::xoshiro256plusplus>(n);
}
N <- 1e5
bm <- bench::mark(
  runif(N),
  runif_128plus(N),
  runif_128starstar(N),
  runif_128plusplus(N),
  runif_256plus(N),
  runif_256starstar(N),
  runif_256plusplus(N),
  check = FALSE,
  min_time = 1
)
knitr::kable(bm[, 1:6])
expression min median itr/sec mem_alloc gc/sec
runif(N) 2.95ms 3.32ms 295.9138 786KB 5.228159
runif_128plus(N) 336.28µs 423.06µs 1914.5290 784KB 38.338503
runif_128starstar(N) 346.3µs 452.96µs 1857.5411 784KB 36.422376
runif_128plusplus(N) 346.99µs 444.55µs 1818.4020 784KB 30.229189
runif_256plus(N) 343.13µs 417.64µs 1902.2133 784KB 31.278781
runif_256starstar(N) 348.68µs 378.25µs 2203.4952 784KB 37.466444
runif_256plusplus(N) 356.95µs 446.44µs 1832.3717 784KB 31.572455
plot(bm)

Here all six generators are very similar, with all of them clearly faster than R’s built in runif. How about sampling with replacement, which is also mostly governed by the speed of generating random numbers:

#include <Rcpp.h>
// [[Rcpp::depends(dqrng, BH)]]
#include <dqrng_sample.h>
#include <xoshiro.h>
// [[Rcpp::plugins(cpp11)]]

// [[Rcpp::export]]
Rcpp::IntegerVector sample_128plus(int m, int n) {
  auto rng = dqrng::generator<dqrng::xoroshiro128plus>(42);
  return dqrng::sample::sample<INTSXP, uint32_t>(rng, uint32_t(m), uint32_t(n), true, 0);
}
// [[Rcpp::export]]
Rcpp::IntegerVector sample_128starstar(int m, int n) {
  auto rng = dqrng::generator<dqrng::xoroshiro128starstar>(42);
  return dqrng::sample::sample<INTSXP, uint32_t>(rng, uint32_t(m), uint32_t(n), true, 0);
}
// [[Rcpp::export]]
Rcpp::IntegerVector sample_128plusplus(int m, int n) {
  auto rng = dqrng::generator<dqrng::xoroshiro128plusplus>(42);
  return dqrng::sample::sample<INTSXP, uint32_t>(rng, uint32_t(m), uint32_t(n), true, 0);
}
// [[Rcpp::export]]
Rcpp::IntegerVector sample_256plus(int m, int n) {
  auto rng = dqrng::generator<dqrng::xoshiro256plus>(42);
  return dqrng::sample::sample<INTSXP, uint32_t>(rng, uint32_t(m), uint32_t(n), true, 0);
}
// [[Rcpp::export]]
Rcpp::IntegerVector sample_256starstar(int m, int n) {
  auto rng = dqrng::generator<dqrng::xoshiro256starstar>(42);
  return dqrng::sample::sample<INTSXP, uint32_t>(rng, uint32_t(m), uint32_t(n), true, 0);
}
// [[Rcpp::export]]
Rcpp::IntegerVector sample_256plusplus(int m, int n) {
  auto rng = dqrng::generator<dqrng::xoshiro256plusplus>(42);
  return dqrng::sample::sample<INTSXP, uint32_t>(rng, uint32_t(m), uint32_t(n), true, 0);
}
N <- 1e5
M <- 1e3
bm <- bench::mark(
  sample.int(M, N, replace = TRUE),
  sample_128plus(M, N),
  sample_128starstar(M, N),
  sample_128plusplus(M, N),
  sample_256plus(M, N),
  sample_256starstar(M, N),
  sample_256plusplus(M, N),
  check = FALSE,
  min_time = 1
)
knitr::kable(bm[, 1:6])
expression min median itr/sec mem_alloc gc/sec
sample.int(M, N, replace = TRUE) 3.22ms 3.45ms 286.6898 401KB 2.033261
sample_128plus(M, N) 350.22µs 417.51µs 2165.7012 393KB 20.307182
sample_128starstar(M, N) 402.46µs 488.56µs 1873.1944 393KB 15.019886
sample_128plusplus(M, N) 350.28µs 380.14µs 2288.3859 393KB 18.472251
sample_256plus(M, N) 367.89µs 392.56µs 2221.3853 393KB 18.484361
sample_256starstar(M, N) 367.38µs 414.71µs 2127.8934 393KB 17.388302
sample_256plusplus(M, N) 419.19µs 493.82µs 1832.8341 393KB 15.023230
plot(bm)

Again nothing really conclusive. All six RNGs are similar and much faster than R’s build in sample.int.

The speed comparisons between these generators is inconclusive to me. The xoroshiro128 seem to be slightly faster than the xoshiro256 variants. So I am leaning towards one of those as the new default while still making the corresponding xoshiro256 variant available in case a longer period or a higher degree of parallelisation is needed. Comparing the ++ and ** variants, I am leaning towards ++, but that is not set in stone.

References

Blackman, David, and Sebastiano Vigna. 2021. “Scrambled Linear Pseudorandom Number Generators.” ACM Transactions on Mathematical Software 47 (4): 1–32. https://doi.org/10.1145/3460772.
To leave a comment for the author, please follow the link and comment on their blog: R on Ralf Stubner.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version