Animating a Monte Carlo Simulation
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
Oftentimes, I run into difficulty trying to explain some of the concepts of statistical sampling with audiences that either have very limited or no understanding of statistics. Given that the majority of communication of analysis has to be digested in a 1-2 hour meeting, data visualization typically is the best route for communicating my methods.
One such case is unboxing how sampling works based on your presumed population and using that sampling to build a monte carlo simulation. I’m a big fan of interactive visualizations or animations that break down in detail the underpinnings of an analysis. The basis of this post is to demonstrate using ggplot2 to construct the frames of the animation and ImageMagick to combine the frames into a gif of a simple monte carlo simulation.
Generate the Data
First, I’m going to use base R’s random sampling functions for the Poisson and the Negative Binomial to generate samples given the presumed parameters. The numbers are then added together to show a very basic monte carlo simulation. In addition, a “slice” of the data is taken that I’m going to use later to break down each step of the simulation.
library(ggplot2) library(trstyles) library(dplyr) library(tidyr) set.seed(82) n <- 10000 mcHist <- data_frame(Poisson = rpois(n, 3), NegBinom = rnbinom(n, 5, .5)) %>% mutate(Simulation = Poisson + NegBinom) %>% gather(Distribution, Value) %>% mutate(Distribution = sub("NegBinom", "Negative Binomial", Distribution)) mcSample <- mcHist %>% group_by(Distribution) %>% slice(1:200) %>% gather(Distribution, Value) %>% group_by(Distribution) %>% mutate(rowNum = row_number(Distribution)) lapply(split(mcHist[['Value']], mcHist[['Distribution']]), summary) $`Negative Binomial` Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 3.000 4.000 5.017 7.000 25.000 $Poisson Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 2.000 3.000 2.991 4.000 12.000 $Simulation Min. 1st Qu. Median Mean 3rd Qu. Max. 0.000 5.000 8.000 8.008 10.000 29.000
Histogram of Expected Results
The background image that I want to show is what is the theoretical model that we are moving towards. I created a large sample of 10,000 so that random number generators will give a close approximation of the population (the presumed distributions). I chose discrete distributions above so that it will be easy to plot histograms without having to worry about the bin widths. With a little tweaking using facet_grid
and geom_text
, I can lay out each distribution and annotate the calculation.
g <- ggplot(mcHist) + geom_histogram(aes(Value, ..density.., color = Distribution), binwidth = 1, alpha = .3, fill = "transparent") + scale_color_tr(guide = FALSE) + facet_grid(~Distribution) + theme_tr() + coord_cartesian(ylim = c(0, .23)) + labs(x = "", y = "Frequency") + theme(panel.grid = element_blank(), strip.text = element_text(size = 16, hjust = 0.1), axis.text.x = element_blank()) + geom_text(x = rep(21, 3*n), y = rep(.1, 3*n), label = rep(c("+", "=", ""), each = n), size = 24) g
Create Frames for Animation
Now that I have my starting frame to show each individual sample, I am going to use geom_dotplot
to plot over the histogram. To help track the simulation moving through time, the latest sample will be colored red. The for
loop iterates over each row number, filters to the current state of the simulation, and updates what is to be plotted. Then the handy ggsave
saves each frame for the animated gif.
for (i in 1:length(unique(mcSample$rowNum))) { dataUpdate <- mcSample %>% group_by(Distribution) %>% filter(rowNum %in% 1:i) %>% group_by(Distribution) %>% mutate(Last = rowNum == i) gUpdate <- g + geom_dotplot(data = dataUpdate, aes(Value, fill = Last), color = NA, binwidth = 1, method = "histodot", dotsize = .6) + scale_fill_manual(guide = FALSE, values = c("black", "red")) #gUpdate ggsave(filename = paste0('frames/plot', sprintf("%03d",i) , '.jpg'), plot = gUpdate, device = 'jpeg') }
Convert Using ImageMagick
All the heavy lifting is over now. I just need to use ImageMagick
to process the frames and convert to a gif. You can use a call to the command line here, but my personal preference is to just switch to the command line. ImageMagick
is a standalone utility software that you will need to install separately from R.
convert -delay 20 -loop 0 *.jpg ../mcsim_animation.gif
I now have an animation that I can use to explain my distribution assumptions and show how each sample updates the model. It’s pretty clear over time that the sampling will eventually converge to the theoretical model.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.