Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
When doing Monte Carlo simulation, it’s important to pick your parameter values efficiently especially if your model is computationally expensive to run. If the model takes two days to run, and a parameter
To get around this problem, one can use quasi-random low-discrepancy sequences which are designed to fill a parameter space efficiently. The R package, randtoolbox, provides implementations of common sequences, like the Halton or Sobol’, but the process involves a couple of steps that beg to be automated. The general process is:
- Generate a n x p matrix of uniformly distributed quasi-random values, where n is the number of simulations you wish to run and p is the number of parameters.
- For each column of the matrix, convert the quasi-random value to the parameter’s actual distribution by inverting the cdf curve. So if you have a uniformly distributed value of 0.5, and you want to convert it to a normal distribution with mean
you would get a parameter value of for use in your simulation. - Run your simulation with these parameter values, and analyse the results
I’ve written a little R function to make this process easier. You simply pass it the number of simulations you want to run, and a list describing each parameter, and it will return the Monte Carlo sample as a data frame. At the moment, it’s pretty rudimentry, and each parameter is described by a name, a distribution name (matching the R abbreviations, e.g. “unif” for the uniform distribution, “norm” for the normal), and two parameters to describe the distribution.
# Generate a Monte Carlo sample generateMCSample <- function(n, vals) { # Packages to generate quasi-random sequences # and rearrange the data require(randtoolbox) require(plyr) # Generate a Sobol' sequence sob <- sobol(n, length(vals)) # Fill a matrix with the values # inverted from uniform values to # distributions of choice samp <- matrix(rep(0,n*(length(vals)+1)), nrow=n) samp[,1] <- 1:n for (i in 1:length(vals)) { l <- vals[[i]] dist <- l$dist params <- l$params samp[,i+1] <- eval(call(paste("q",dist,sep=""),sob[,i],params[1],params[2])) } # Convert matrix to data frame and label samp <- as.data.frame(samp) names(samp) <- c("n",laply(vals, function(l) l$var)) return(samp) }
Here’s a simple example to show how it can be used.
n <- 1000 # number of simulations to run # List described the distribution of each variable vals <- list(list(var="Uniform", dist="unif", params=c(0,1)), list(var="Normal", dist="norm", params=c(0,1)), list(var="Weibull", dist="weibull", params=c(2,1))) # Generate the sample samp <- generateMCSample(n,vals) # Plot with ggplot2 library(ggplot2) samp.mt <- melt(samp,id="n") gg <- ggplot(samp.mt,aes(x=value)) + geom_histogram(binwidth=0.1) + theme_bw() + facet_wrap(~variable, ncol=3,scale="free")
And here’s the resulting picture:
Any suggestions on how to improve this function so that it has a more generic description of a distribution would be appreciated (e.g. for distributions with n!=2 parameters).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.