Time series cross validation of effective reproduction number nowcasts by @ellis2013nz
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
For a few weeks now I’ve been publishing estimates of the time-varying effective reproduction number for Covid-19 in Victoria (and as bonuses, NSW and Australia as a whole). Reproduction number is the average number of people each infected person infects in turn. My modelling includes nowcasts up to “today” of that number (and forecasts for another 14 days ahead), a risky enterprise because the evidence of today’s number of infections is obviously only going to turn up in 7-14 days time, after incubation, presentation, testing and making its way onto the official statistics.
At the time of writing, those estimates look like this:
It’s been interesting watching the estimates of reproduction number change as new data comes in each morning, not least because of a personal interest (as a Melbourne-dweller) in the numbers coming down as quickly as possible. The recent outbreak here has come under control and incidence has rapidly declined. The timing and speed of this certainly seems a dramatic endorsement of the successful strategy of strict lockdowns.
Are those nowcasts any good?
In a comment on my website, Ross Gayler made this excellent suggestion:
On the effective R plot, I would be interested to see the points showing best point estimate as a function of day as estimated that day plus a smoothed fit to those points. That would (presumably) demonstrate that the dispersion of best point estimates as read by someone following the daily updates is indeed in line with the width of the credible interval as seen at the nowcast. I also presume that the nowcasts will be systematically below the historical curve on the rising segments and systematically above the historical curve on the falling segments because the nowcast estimates seem to be increasingly biased towards being constant as they approach “now”.
I have now done this. As I haven’t kept a record of all my past projections, I had to approach the problem by creating new sets of nowcasts, using the data available only up to a the day being nowcast. This isn’t quite as good as an actual set of published results, because the data gets minor revisions over time and I have made minor improvements to my method; but it is good enough for my purposes.
So this is a form of time series cross-validation, a very good method of checking the usefulness of a forecasting (or nowcasting) method. For every day from 1 June 2020 to 15 August 2020 I took the data up to that day and fit my usual model to it, using the EpiNow2
R and Stan package. Then I compared the 80 or so “nowcasts today” from those models to the best estimate of the reproduction number I can make today (29 August), with the benefit of all the subsequent data. I stopped this exercise at 14 days before the present deliberately, as more recently than that the “best hindsight” estimate of reproduction number is probably still unstable.
The code that does all this is on GitHub. It’s embedded with a few supporting scripts in a sort of sub-project of my usual “blog source” project, but if you’re interested and scan that victoria-cv.R
script it all should be pretty clear how it works. The first 75 lines is just importing the data (via The Guardian), and the final 80 lines is just about organising data for some plots; the actual fitting of all those models to the different subsets of the data takes place in 50 lines of code starting at about line 80.
Here’s the results. The red line shows where I currently estimate reporduction number to be. The blue line and the blue shaded area show the nowcast estimates made with the data available at the time:
Obviously, there’s a big lag effect here. If you look at the original graphic of the model results, it’s clear why. In the brown shaded “nowcast” period, the model is strongly inclined to suggest reproduction number is flat ie stays about the same as the last good data, two weeks before today. Of course, if during that fortnight, infection rates went down, we will have over-estimated the reproduction number. In fact, this is going to be the case almost by definition – any nowcast or forecast is going to be wrong in the direction that the reality changes (relative to your model) since the last-good data.
58% of my 50% credibility intervals contained what is now the best estimate of the true value. Of my 90% credibility intervals, 89% contained the best value. Those are pretty fair success rates that make me think the widths of the intervals are set about right. In an ideal world, your 50% credibility interval has a 50% chance of containing the correct data, but in this case our best (hindsight) estimate of “correct” is itself going to be correlated with the original (nowcast) estimates, and 58% agreement feels just right.
Here’s an alternative visualisation, as a connected scatterplot:
Note that this scatterplot only has a crude approximation of the actual credibility intervals for each day’s estimate (which I think are too complex to try to show on this particular chart), and it centres the uncertainty on the diagonal line representing agreement between the nowcast and subsequent hindsight. That’s why you can’t get the figures of 58% and 89% agreement with credibility intervals visually from it.
Is this ok?
Yes, I think this is ok. The lag is a very important thing to remember. On any day that we look at my nowcasts and forecasts, we should remind ourselves “really, we only have much information on infections taking place up to 7-14 days ago. If they have slowed down (or sped up) in the intervening time, I’m likely to be overestimating (or underestimating) the value of reproduction number today, and the cases to come in the forecast period.” That’s a sensible thing to do.
The challenge of course is that we don’t know if infections are slowing down or speeding up in those past 7-14 days. It would be a bold step indeed to be confident that “momentum” was taking it in either direction; we just don’t know when we are hitting the floor (or ceiling) until we start getting data. A more sophisticated model would use some more timely external information – eg mobility data from Facebook or Apple – to help with this, but I haven’t got around to that yet. And hopefully this particular outbreak will be over before I get time to do it.
On average, my credibility intervals for the nowcast of reproduction number are performing pretty much as advertised, as far as we can tell.
So this was a good exercise, and coming out of it I feel a bit more confident that my nowcasts and forecasts are doing useful work.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.