Visualization of probabilistic forecasts
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This week my research group discussed Adrian Raftery’s recent paper on “Use and Communication of Probabilistic Forecasts” which provides a fascinating but brief survey of some of his work on modelling and communicating uncertain futures. Coincidentally, today I was also sent a copy of David Spiegelhalter’s paper on “Visualizing Uncertainty About the Future”. Both are well-worth reading.
It made me think about my own efforts to communicate future uncertainty through graphics. Of course, for time series forecasts I normally show prediction intervals. I prefer to use more than one interval at a time because it helps convey a little more information. The default in the forecast package for R is to show both an 80% and a 95% interval like this:
It is sometimes preferable to use a 50% and a 95% interval, rather like a boxplot:
In some circles (especially macroeconomic forecasting), fan charts are popular:
Personally, I don’t like these at all as they lose any specific probabilistic interpretability. What does the darker shaded region actually refer to? At least in the previous version, it is clear that the dark region contains 50% of the probability.
The above three examples are easily produced using the forecast package:
fit <- ets(hsales) plot(forecast(fit),include=120) plot(forecast(fit,level=c(50,95)),include=120) plot(forecast(fit,fan=TRUE),include=120) |
For multi-modal distributions I like to use highest density regions. Here is an example applied to Nicholson’s blowfly data using a threshold model:
The dark region has 50% coverage and the light region has 95% coverage. The forecast distributions become bimodal after the first ten iterations, and so the 50% region is split in two to show that. This graphic was taken from a J. Forecasting paper I wrote in 1996, so these ideas have been around for a while!
It is easy enough to produce forecast HDR with time series. Here is some R code to do it:
#HDR for time series object # Assumes that forecasts can be computed and futures simulated from object forecasthdr <- function(object, h = ifelse(object$m > 1, 2 * object$m, 10), nsim=2000, plot=TRUE, level=c(50,95), xlim=NULL, ylim=NULL, ...) { require(hdrcde) # Compute forecasts fc <- forecast(object) ft <- time(fc$mean) # Simulate future sample paths sim <- matrix(0,nrow=h,ncol=nsim) for(i in 1:nsim) sim[,i] <- simulate(object, nsim=h) # Compute HDRs nl <- length(level) hd <- array(0, c(h,nl,10)) mode <- numeric(h) for(k in 1:h) { hdtmp <- hdr(sim[k,], prob=level) hd[k,,1:ncol(hdtmp$hdr)] <- hdtmp$hdr mode[k] <- hdtmp$mode } # Remove unnecessary sections of HDRs nz <- apply(abs(hd),3,sum) > 0 hd <- hd[,,nz] dimnames(hd)[[1]] <- 1:h dimnames(hd)[[2]] <- level # Produce plot if required if(plot) { if(is.null(xlim)) xlim <- range(time(object$x),ft) if(is.null(ylim)) ylim <- range(object$x, hd) plot(object$x,xlim=xlim, ylim=ylim, ...) # Show HDRs cols <- rev(colorspace::sequential_hcl(52))[level - 49] for(k in 1:h) { for(j in 1:nl) { hdtmp <- hd[k,j,] nint <- length(hdtmp)/2 for(l in 1:nint) { polygon(ft[k]+c(-1,-1,1,1)/object$m/2, c(hdtmp[2*l-1],hdtmp[2*l],hdtmp[2*l],hdtmp[2*l-1]), col=cols[j], border=FALSE) } } points(ft[k], mode[k], pch=19, col="blue",cex=0.8) } #lines(fc$mean,col='blue',lwd=2) } # Return HDRs return(list(hdr=hd,mode=mode,level=level)) } |
We can apply it using the example I started with:
z <- forecasthdr(fit,xlim=c(1986,1998),nsim=5000, xlab="Year",ylab="US monthly housing sales") |
The dots are modes of the forecast distributions, and the 50% and 95% highest density regions are also shown. In this case, the distributions are unimodal, and so all the regions are intervals.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.