Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This week my research group discussed Adrian Raftery’s recent paper on “Use and Communication of Probabilistic Forecasts” which provides a fascinating but brief survey of some of his work on modelling and communicating uncertain futures. Coincidentally, today I was also sent a copy of David Spiegelhalter’s paper on “Visualizing Uncertainty About the Future”. Both are well-worth reading.
It made me think about my own efforts to communicate future uncertainty through graphics. Of course, for time series forecasts I normally show prediction intervals. I prefer to use more than one interval at a time because it helps convey a little more information. The default in the forecast package for R is to show both an 80% and a 95% interval like this:
The above three examples are easily produced using the forecast package:
fit <- ets(hsales) plot(forecast(fit),include=120) plot(forecast(fit,level=c(50,95)),include=120) plot(forecast(fit,fan=TRUE),include=120) |
For multi-modal distributions I like to use highest density regions. Here is an example applied to Nicholson’s blowfly data using a threshold model:
The dark region has 50% coverage and the light region has 95% coverage. The forecast distributions become bimodal after the first ten iterations, and so the 50% region is split in two to show that. This graphic was taken from a J. Forecasting paper I wrote in 1996, so these ideas have been around for a while!
It is easy enough to produce forecast HDR with time series. Here is some R code to do it:
#HDR for time series object # Assumes that forecasts can be computed and futures simulated from object forecasthdr <- function(object, h = ifelse(object$m > 1, 2 * object$m, 10), nsim=2000, plot=TRUE, level=c(50,95), xlim=NULL, ylim=NULL, ...) { require(hdrcde) # Compute forecasts fc <- forecast(object) ft <- time(fc$mean) # Simulate future sample paths sim <- matrix(0,nrow=h,ncol=nsim) for(i in 1:nsim) sim[,i] <- simulate(object, nsim=h) # Compute HDRs nl <- length(level) hd <- array(0, c(h,nl,10)) mode <- numeric(h) for(k in 1:h) { hdtmp <- hdr(sim[k,], prob=level) hd[k,,1:ncol(hdtmp$hdr)] <- hdtmp$hdr mode[k] <- hdtmp$mode } # Remove unnecessary sections of HDRs nz <- apply(abs(hd),3,sum) > 0 hd <- hd[,,nz] dimnames(hd)[[1]] <- 1:h dimnames(hd)[[2]] <- level # Produce plot if required if(plot) { if(is.null(xlim)) xlim <- range(time(object$x),ft) if(is.null(ylim)) ylim <- range(object$x, hd) plot(object$x,xlim=xlim, ylim=ylim, ...) # Show HDRs cols <- rev(colorspace::sequential_hcl(52))[level - 49] for(k in 1:h) { for(j in 1:nl) { hdtmp <- hd[k,j,] nint <- length(hdtmp)/2 for(l in 1:nint) { polygon(ft[k]+c(-1,-1,1,1)/object$m/2, c(hdtmp[2*l-1],hdtmp[2*l],hdtmp[2*l],hdtmp[2*l-1]), col=cols[j], border=FALSE) } } points(ft[k], mode[k], pch=19, col="blue",cex=0.8) } #lines(fc$mean,col='blue',lwd=2) } # Return HDRs return(list(hdr=hd,mode=mode,level=level)) } |
We can apply it using the example I started with:
z <- forecasthdr(fit,xlim=c(1986,1998),nsim=5000, xlab="Year",ylab="US monthly housing sales") |
The dots are modes of the forecast distributions, and the 50% and 95% highest density regions are also shown. In this case, the distributions are unimodal, and so all the regions are intervals.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.