Explaining the ABC-Rejection Algorithm in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Approximate Bayesian Computation (ABC) is an umbrella term for a class of algorithms and ideas that allow performing an approximate estimation of the likelihood / posterior for stochastic simulation models when the likelihood cannot be explicitly calculated (intractable likelihood). To give you the idea in a nutshell: to approximate the likelihood, consider that for a stochastic simulation with a given parameter, the probability that the outcome is CLOSE to the data is approximately proportional to it being IDENTICAL to the data, the latter being the definition of the likelihood. Armed with this insight, one can use ABC and related method to generate approximate likelihoods / posteriors for practically any stochastic simulation, including IBMs/ABMs, stochastic movement models etc., that could not have been fit with conventional means. We give a more detailed introduction on this topic in our 2011 EL review.
I have previously blogged about estimating models with the ABC-MCMC algorithm proposed by Marjoram (2003), as well as about alternatives to the ABC approximation, such as the use of synthetic likelihoods. All of those were based on MCMC methods. However, some recent work made me appreciate the usefulness of the “father” of all ABC algorithms, the ABC rejection sampler first proposed by Tavare (1997) (see also Beaumont 2010 for a review), and I thought a small demonstration in R would be useful.
The stochastic model, data, and the summary statistics
Assume you have a stochastic model with output xsim, and you have a data xobs that you want to use to fit the model. ABC-Rejection, as practically all other simulation-based inference methods, will work better if the dimensionality of the data is lower. Therefore, one should test whether it’s possible to find summary statistics for xsim/xobs that reduce the dimensionality of the data without loosing information (see the section on summary statistics in our review).
Let’s look at this in an example. For convenience, I will take a very simple stochastic model – a Weibull distribution. Of course, there would be other ways to fit the model, but as a demonstration, we’ll use ABC. I create a set of observed data (10 draws of rweibull with parameters 2,5), and as summary statistics I use the mean and the standard deviation. So, we have our observed summary statistics, and for convenience I define as the stochastic model that is to be fit a function that takes two parameters, creates 10 stochastic draws from rweibull, and directly returns the simulated summary statistics.
observedData = rweibull(10, 2, 5) observedSummary = c(mean(data), sd(data)) # The summary here is used because simulation-based methods are typically more efficient # when the dimensionality of the data is low. # In general, one has to check whether information is lost by such a reduction(sufficiency). # I'm actually not sure if it is with this choice, it might be depending on the parameters, # but we will see that the summary is good enough # Defining a stochastic model with Weilbull output # For convenience I do the summary in the same step model <- function(par){ simulatedData <- rweibull(10, par[1,1], par[1,2]) simulatedSummary <- c(mean(simulatedData), sd(simulatedData)) return(simulatedSummary) }
Calculating the simulation outputs
Now, once that is done, the ABC-Rejection algorithms is embarrassingly simple. Draw a large number of different parameter values from your prior (if you work Bayesian, I do and assume uniform priors), or from uniform distributions if you work with a MLE mindset, run the simulation, and record the output. The interesting point about this approach is that, while maybe not as efficient as the ABC-MCMC, it will scale linearly on any parallel cluster that is available to you. So if you can get your hands on a large cluster, you can make MANY calculations in parallel.
Maybe it appears a bit unorthodox for people that have coded normal Bayesian rejection algorithms that we don’t reject directly, but calculate the (euclidean) distance between observed and simulated summary statistics and record everything. The reason will become clear later. There is no particular reason to use euclidean distances, but it’s a standard choice. The only catch is that if model outputs have different ranges of their variance, one would usually weight the distance in each dimension of the summary statistics by this variance. For this example, however, we’ll be fine with an equal weight.
n = 20000 fit = data.frame(shape = runif(n, 1, 6), scale = runif(n, 1,10), summary1 = rep(NA, n), summary2 = rep(NA, n), distance = rep(NA, n)) for (i in 1:n){ prediction <- model(fit[i,1:2]) deviation = sqrt(sum((prediction- observedSummaryStatistics)^2)) fit[i,3:5] = c(prediction, deviation) }
Surely there are more efficient ways to get this result than a for-loop, but I think it’s easier to read it that way.
Analyzing the results
OK, so far, we have a set with a lot of parameters drawn from the prior, and we have the distance between observed and simulated data for these parameters. The trick now is to remember the ABC idea: the probability of the stochastic output being CLOSE to the data is approximately proportional to the probability of the output being EQUAL the data (= the likelihood). Thus, if we filter out all simulation runs that have led to an output close to the data, we filter (approximately) proportional to the likelihood. In practice, this is done by choosing an acceptance interval epsilon like that
plot(fit[fit[,5] < 1.5, 1:2], xlim = c(1,6), ylim = c(1,10), col = "lightgrey", main = "Accepted parameters for different values of epsilon") points(fit[fit[,5] < 1, 1:2], pch = 18, col = "gray") points(fit[fit[,5] < 0.5, 1:2], pch = 8, col = "red") legend("topright", c("< 1.5", "< 1", "< 0.5"), pch = c(1,18,8), col = c("lightgrey", "gray", "red")) abline(v = 2) abline(h = 5) # alternatively, you can use library(ABC) to calculate the acceptance and do some plots
The result should look something like that.
Basically, the more points we have in an area, the higher the approximated likelihood / posterior density (i.e. the model fit). Note that most points concentrate in the area of the true parameters, visualized by the two lines across the plot.
I plotted the accepted parameters for 3 values of epsilon, the maximum distance between observed and simulated data that would still be accepted. That, btw. is the reason why we don’t reject directly – by recording the distance, we can play with epsilon later. Note that, as epsilon gets more narrow, the number of parameters that is accepted decreases. In the limit of epsilon –> 0, n –> infinity, the sample would approach the true posterior, but then, we would get less and less points. For larger epsilon, we typically get more points and therefore a better stochastic approximation of the shape of the posterior, but at the cost that the posterior is typically somewhat wider. In practice, having more accepted samples is usually the more important consideration, also because, with some limitations, it is possible to estimate the error that is made by this approximation and correct it in the samples, basically moving the points where they would have been if the epsilon was 0 (see the last box in our review).
You can interpret the final sample like any other MCMC output . The complete r script for this post is here. Disclaimer: in an real application you should thoroughly test whether the summary statistics chosen are sufficient for your model/data combination.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.