Site icon R-bloggers

Special Issue of ACM TOMACS on Monte Carlo Methods in Statistics

[This article was first published on Xi'an's Og » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

As posted here a long, long while ago, following a suggestion from the editor (and North America Cycling Champion!) Pierre Lécuyer (Université de Montréal), Arnaud Doucet (University of Oxford) and myself acted as guest editors for a special issue of ACM TOMACS on Monte Carlo Methods in Statistics. (Coincidentally, I am attending a board meeting for TOMACS tonight in Berlin!) The issue is now ready for publication (next February unless I am confused!) and made of the following papers:

* Massive parallelization of serial inference algorithms for a complex generalized linear model
MARC A. SUCHARD, IVAN ZORYCH, PATRICK RYAN, DAVID MADIGAN
*Convergence of a Particle-based Approximation of the Block Online Expectation Maximization Algorithm
SYLVAIN LE CORFF and GERSENDE FORT
* Efficient MCMC for Binomial Logit Models
AGNES FUSSL, SYLVIA FRÜHWIRTH-SCHNATTER, RUDOLF FRÜHWIRTH
* Adaptive Equi-Energy Sampler: Convergence and Illustration
AMANDINE SCHRECK and GERSENDE FORT and ERIC MOULINES
* Particle algorithms for optimization on binary spaces
CHRISTIAN SCHÄFER
* Posterior expectation of regularly paved random histograms
RAAZESH SAINUDIIN, GLORIA TENG, JENNIFER HARLOW, and DOMINIC LEE
* Small variance estimators for rare event probabilities
MICHEL BRONIATOWSKI and VIRGILE CARON
* Self-Avoiding Random Dynamics on Integer Complex Systems
FIRAS HAMZE, ZIYU WANG, and NANDO DE FREITAS
* Bayesian learning of noisy Markov decision processes
SUMEETPAL S. SINGH, NICOLAS CHOPIN, and NICK WHITELEY

Here is the draft of the editorial that will appear at the beginning of this special issue. (All faults are mine, of course!)

While Monte Carlo methods are used in a wide range of domains, which started with particle physics in the 1940′s, statistics has a particular connection with those methods in that it both relies on them to handle complex models and validates their convergence by providing assessment tools. Both the bootstrap and the Markov chain Monte Carlo (MCMC) revolutions of the 1980′s and 1990′s have changed for good the way Monte Carlo methods are perceived by statisticians, moving them from a peripheral tool to an essential component of statistical analysis. We are thus pleased to have been given the opportunity of editing this special issue of ACM TOMACS and handling a fine collection of submissions.

The accepted papers in this issue almost cover the whole range of the use of simulation methods in statistics, from optimisation (Le Corff and Fort, Schäfer) to posterior simulation, Fussl et al., Hamze et al., Sainudiin et al., Singh et al.), to rare event inference (Broniatowski and Caron), to parallelisation (Suchard et al.), with a collection of Monte Carlo techniques, from particle systems (Le Corff and Fort, Schäfer) to Markov chain Monte Carlo (Fussl et al., Hamze et al., Sainudiin et al., Singh et al., Suchard et al.), to importance sampling (Broniatowski and Caron).

The accepted papers in this issue almost cover the whole range of the use of simulation methods in statistics, from optimisation (Le Corff and Fort, Schäfer) to posterior simulation, Fussl et al., Hamze et al., Sainudiin et al., Singh et al.), to rare event inference (Broniatowski and Caron), to parallelisation (Suchard et al.), with a collection of Monte Carlo techniques, from particle systems (Le Corff and Fort, Schäfer) to Markov chain Monte Carlo (Fussl et al., Hamze et al., Sainudiin et al., Singh et al., Suchard et al.), to importance sampling (Broniatowski and Caron).

The paper by Le Corff and Fort furthermore offers insights on the “workhorse” of computational statistics, namely the Expectation–Maximisation (EM) algorithm introduced by Dempster, Laird, and Rubin (1977). It indeed characterises the convergence speed of some on-line (sequential Monte Carlo) versions of the EM algorithm, thus helps quantifying the folklore that “EM converges fast”. In the same area of missing variable models, Fussl et al. reassess the classical (Bayesian) logit model and propose a new completion scheme that aggregate the missing variables towards a much more efficient Metropolis-Hastings sampler, in comparison with the existing schemes. The paper by Singh et al. can also be connected to this theme, as they study Bayesian inverse reinforcement learning problems involving latent variables that are estimated and used in prediction, thanks to an efficient MCMC sampler.

The paper by Schreck et al. on the (MCMC) equi-energy sampler expands on a state-of-the-art sampler by constructing and completely validating an adaptive version of the algorithm. This area being currently very active, it represents a major step for the field. Another paper, by Sainudiin et al., is also concerned with theoretical aspects, namely the construction and validation of an MCMC algorithm on an unusual space of tree-based histograms. This is the closest paper in this issue to non-parametric statistical estimation, which is one significant missing domain here, since simulation in functional spaces offers highly topical idiosyncrasies. The paper by Broniatowski and Caron also remains on a rather theoretical plane by looking at large or moderate deviations in connection with importance sampling and cross-entropy techniques, aiming at some degree of optimality in the long run.

As mentioned above, two papers are specifically addressing statistical problems of optimisation on binary-valued systems, the particle algorithm of Schäfer that build specially designed parametric families on binary spaces that bring significant improvements over the existing schemes, and Hamze et al. on self-avoiding random walks, coupled with Bayesian optimisation, which handles remarkably well complex models.

A last area in rapid development that is represented in this issue is parallelisation. As discussed in Suchard et al., there are more and more models that require parallel implementation to be handled properly and, once more, specific statistical methodologies can and must be devised to answer such challenges. The paper by Suchard et al. handles generalized linear models for massive datasets for Bayesian maximum a posteriori using GPUs (graphical processing units), despite the serial nature of their cyclic coordinate descent algorithm. It can be seen as an outlier in this special issue in the sense that it deals more with statistical computing than with computational statistics, but we think it has completely its place in the field for reaching the implementation levels that are necessary to address to face the “big data” challenges.


Filed under: Books, R, Statistics, University life Tagged: ACM Transactions on Modeling and Computer Simulation, Berlin, EM algorithm, importance sampling, integer valued functions, MCMC, Monte Carlos Statistical Methods, optimisation, parallelisation, particle filters, rare events, simulation, WSC 2012

To leave a comment for the author, please follow the link and comment on their blog: Xi'an's Og » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.