Generating a multivariate gaussian distribution using RcppArmadillo
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
There are many ways to simulate a multivariate gaussian distribution assuming that you can simulate from independent univariate normal distributions. One of the most popular method is based on the Cholesky decomposition. Let’s see how Rcpp
and Armadillo
perform on this task.
#include <RcppArmadillo.h> // [[Rcpp::depends(RcppArmadillo)]] using namespace Rcpp; // [[Rcpp::export]] arma::mat mvrnormArma(int n, arma::vec mu, arma::mat sigma) { int ncols = sigma.n_cols; arma::mat Y = arma::randn(n, ncols); return arma::repmat(mu, 1, n).t() + Y * arma::chol(sigma); }
The easiest way to perform a Cholesky distribution in R is to use the chol
function in R
which interface some fast LAPACK
routines.
### naive implementation in R mvrnormR <- function(n, mu, sigma) { ncols <- ncol(sigma) mu <- rep(mu, each = n) ## not obliged to use a matrix (recycling) mu + matrix(rnorm(n * ncols), ncol = ncols) %*% chol(sigma) }
We will also use MASS:mvrnorm
which implements it differently:
require(MASS) Loading required package: MASS ### Covariance matrix and mean vector sigma <- matrix(c(1, 0.9, -0.3, 0.9, 1, -0.4, -0.3, -0.4, 1), ncol = 3) mu <- c(10, 5, -3) require(MASS) ### checking variance set.seed(123) cor(mvrnormR(100, mu, sigma)) [,1] [,2] [,3] [1,] 1.0000 0.8851 -0.3830 [2,] 0.8851 1.0000 -0.4675 [3,] -0.3830 -0.4675 1.0000 cor(MASS::mvrnorm(100, mu, sigma)) [,1] [,2] [,3] [1,] 1.0000 0.9106 -0.3016 [2,] 0.9106 1.0000 -0.4599 [3,] -0.3016 -0.4599 1.0000 cor(mvrnormArma(100, mu, sigma)) [,1] [,2] [,3] [1,] 1.000 0.9020 -0.3530 [2,] 0.902 1.0000 -0.4889 [3,] -0.353 -0.4889 1.0000 ## checking means colMeans(mvrnormR(100, mu, sigma)) [1] 9.850 4.911 -2.902 colMeans(MASS::mvrnorm(100, mu, sigma)) [1] 10.051 5.046 -2.914 colMeans(mvrnormArma(100, mu, sigma)) [1] 9.825 4.854 -2.873
Now, let’s benchmark the different versions:
require(rbenchmark) Loading required package: rbenchmark benchmark(mvrnormR(1e4, mu, sigma), MASS::mvrnorm(1e4, mu, sigma), mvrnormArma(1e4, mu, sigma), columns = c('test', 'replications', 'relative', 'elapsed'), order = 'elapsed') test replications relative elapsed 3 mvrnormArma(10000, mu, sigma) 100 1.000 0.219 1 mvrnormR(10000, mu, sigma) 100 1.913 0.419 2 MASS::mvrnorm(10000, mu, sigma) 100 2.046 0.448
The RcppArmadillo
function outperforms the MASS implementation and the naive R code, but more surprisinugly mvrnormR
is slightly faster than mvrnorm
in this benchmark.
To be fair, while digging into the MASS::mvrnorm
code it appears that there are few code sanity checks ( such as the positive definiteness of Sigma
).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.