Sampling for Monte Carlo simulations with R

James Keirstead

10 years ago

[This article was first published on James Keirstead » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When doing Monte Carlo simulation, it’s important to pick your parameter values efficiently especially if your model is computationally expensive to run. If the model takes two days to run, and a parameter ranges from 0 to 10, it doesn’t make much sense to run it once at and again at if hasn’t been explored at all.

To get around this problem, one can use quasi-random low-discrepancy sequences which are designed to fill a parameter space efficiently. The R package, randtoolbox, provides implementations of common sequences, like the Halton or Sobol’, but the process involves a couple of steps that beg to be automated. The general process is:

Generate a n x p matrix of uniformly distributed quasi-random values, where n is the number of simulations you wish to run and p is the number of parameters.
For each column of the matrix, convert the quasi-random value to the parameter’s actual distribution by inverting the cdf curve. So if you have a uniformly distributed value of 0.5, and you want to convert it to a normal distribution with mean you would get a parameter value of for use in your simulation.
Run your simulation with these parameter values, and analyse the results

I’ve written a little R function to make this process easier. You simply pass it the number of simulations you want to run, and a list describing each parameter, and it will return the Monte Carlo sample as a data frame. At the moment, it’s pretty rudimentry, and each parameter is described by a name, a distribution name (matching the R abbreviations, e.g. “unif” for the uniform distribution, “norm” for the normal), and two parameters to describe the distribution.

# Generate a Monte Carlo sample
generateMCSample <- function(n, vals) {
  # Packages to generate quasi-random sequences
  # and rearrange the data
  require(randtoolbox)
  require(plyr)
 
  # Generate a Sobol' sequence
  sob <- sobol(n, length(vals))
 
  # Fill a matrix with the values
  # inverted from uniform values to
  # distributions of choice
  samp <- matrix(rep(0,n*(length(vals)+1)), nrow=n)
  samp[,1] <- 1:n
  for (i in 1:length(vals)) {
    l <- vals[[i]]
    dist <- l$dist
    params <- l$params
    samp[,i+1] <- eval(call(paste("q",dist,sep=""),sob[,i],params[1],params[2]))
  }
 
  # Convert matrix to data frame and label
  samp <- as.data.frame(samp)
  names(samp) <- c("n",laply(vals, function(l) l$var))
  return(samp)
}

Here’s a simple example to show how it can be used.

n <- 1000  # number of simulations to run
 
# List described the distribution of each variable
vals <- list(list(var="Uniform",
               dist="unif",
               params=c(0,1)),
          list(var="Normal",
               dist="norm",
               params=c(0,1)),
          list(var="Weibull",
               dist="weibull",
               params=c(2,1)))
 
# Generate the sample
samp <- generateMCSample(n,vals)
 
# Plot with ggplot2
library(ggplot2)
samp.mt <- melt(samp,id="n")
gg <- ggplot(samp.mt,aes(x=value)) +
  geom_histogram(binwidth=0.1) +
  theme_bw() +
  facet_wrap(~variable, ncol=3,scale="free")

And here’s the resulting picture:

Histogram of three parameters in a Monte Carlo sample

Any suggestions on how to improve this function so that it has a more generic description of a distribution would be appreciated (e.g. for distributions with n!=2 parameters).

To leave a comment for the author, please follow the link and comment on their blog: James Keirstead » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.