Recreating rmultinom and rpois with Rcpp
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Sometimes one needs to mimic the exact behavior of R’s Distributions
within C++
code. The incredible Rcpp team has provided access to these distributions through
Rmath.h
(in the R::
namespace), as well as through the Rcpp::
namespace where there
can be two forms: scalar as in R, and vectorized via Rcpp sugar. The behavior of these
functions may not always exactly match what the user expects from the standard R behavior,
particularly if attempting to use the functions in Rmath.h
. In particular, the functions
in Rmath.h
are not vectorized. In what follows, I will use Rcpp to mimic the
behavior of both the rmultinom
and rpois
functions available in base R so that this
functionality and behavior is provided in native C++.
The multinomial distribution
generalizes the binomial distribution to k
discrete outcomes instead of 2; consequently,
it is parameterized in terms of k
probabilities that must sum to 1. The base R function
rmultinom
used for generating multinomial data takes three arguments: n
the number of
simulated data sets to produce, size
, the number of multinomial outcomes to sample for
each data set, and prob
a numeric vector of probabilities. The function returns a k
$\times$ n
integer matrix.
The following C++ code uses the R::rmultinom
function available in Rmath.h
to generate
size
multinomial outcomes. The R::rmultinom
function relies on referencing a pointer
to an IntegerVector
to store the results. We create a helper function, rmultinom_1
,
that draws size
multinomial outcomes from the multinomial distribution based on the
probabilities in prob
. We then do this n
independent times in the function
rmultinom_rcpp
. To match the base R functionality, rmultinom_rcpp
returns a k
$\times$ n
IntegerMatrix
.
#include <Rcpp.h>
using namespace Rcpp;
IntegerVector rmultinom_1(unsigned int &size, NumericVector &probs, unsigned int &N) {
IntegerVector outcome(N);
rmultinom(size, probs.begin(), N, outcome.begin());
return outcome;
}
// [[Rcpp::export]]
IntegerMatrix rmultinom_rcpp(unsigned int &n, unsigned int &size, NumericVector &probs) {
unsigned int N = probs.length();
IntegerMatrix sim(N, n);
for (unsigned int i = 0; i < n; i++) {
sim(_,i) = rmultinom_1(size, probs, N);
}
return sim;
}
We now check if the rmultinom
and rmultinom_rcpp
functions produce the same results.
We generate a vector of 200 probabilities that sum to 1. We will sample 500 multinomial
outcomes and do this independently 20 times.
prob <- runif(200)
prob <- prob/sum(prob) # standardize the probabilities
size <- 500
n <- 20
set.seed(10)
sim_r <- rmultinom(n, size, prob)
set.seed(10)
sim_rcpp <- rmultinom_rcpp(n, size, prob)
all.equal(sim_r, sim_rcpp)
[1] TRUE
A benchmark of the functions suggests that the rmultinom_rcpp
function is very slightly
slower than the rmultinom
function, but that is not really a concern for our purposes.
microbenchmark::microbenchmark(
rmultinom(1000, size, prob),
rmultinom_rcpp(1000, size, prob)
)
Unit: milliseconds expr min lq mean median uq max neval cld rmultinom(1000, size, prob) 10.8676 10.9925 11.1737 11.0826 11.1898 13.9595 100 a rmultinom_rcpp(1000, size, prob) 11.0879 11.2072 11.4897 11.3341 11.6203 13.9920 100 b
The poisson distribution is a
non-negative discrete distribution characterized by having identical mean and
variance. The base R function rpois
used for generating Poisson data takes two
arguments: n
the number of simulated values to produce, and lambda
, a positive numeric
vector. The rpois
function cycles (and recycles) through the values in lambda
for each
successive value simulated. The function produces an integer vector of length n
. We
provide similar functionality using the R::rpois
function available in Rmath.h
. Note
that we cycle through the values of lambda
so that if the end of the lambda
vector is
reached before we have generated n
values, then we restart at the beginning of the
lambda
vector.
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector rpois_rcpp(unsigned int &n, NumericVector &lambda) {
unsigned int lambda_i = 0;
IntegerVector sim(n);
for (unsigned int i = 0; i < n; i++) {
sim[i] = R::rpois(lambda[lambda_i]);
// update lambda_i to match next realized value with correct mean
lambda_i++;
// restart lambda_i at 0 if end of lambda reached
if (lambda_i == lambda.length()) {
lambda_i = 0;
}
}
return sim;
}
We now evaluate whether the rpois
and rpois
functions produce the same results. We
generate a positive vector with 200 values for lambda
and draw length(lambda) + 5
independent Poisson values.
lambda <- runif(200, 0.5, 3)
set.seed(10)
pois_sim_r <- rpois(length(lambda) + 5, lambda)
set.seed(10)
pois_sim_rcpp <- rpois_rcpp(length(lambda) + 5, lambda)
all.equal(pois_sim_r, pois_sim_rcpp)
[1] TRUE
A benchmark of the two functions suggests the rpois_rcpp
function may be slightly faster,
but once again, that is not our primary concern here.
microbenchmark::microbenchmark(
rpois(length(lambda) + 5, lambda),
rpois_rcpp(length(lambda) + 5, lambda)
)
Unit: microseconds expr min lq mean median uq max neval cld rpois(length(lambda) + 5, lambda) 7.412 7.7780 8.50529 7.9495 8.2165 60.014 100 b rpois_rcpp(length(lambda) + 5, lambda) 6.721 6.9965 7.37430 7.1645 7.4950 21.263 100 a
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.