Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
With multiple reports of influenza H5N1 cases that have no clear animal exposure, it is useful to consider what kinds of analysis would be required, and how easily this could be performed. As a starting point, this post reflects on some of the real-time work my colleagues and I contributed to inform the UK response to COVID-19.
< section id="reflections-on-covid-19-contributions" class="level2">Reflections on COVID-19 Contributions
During COVID-19, academic participants like myself in the UK contributed analysis to the SPI-M-O advisory group, which focused on epidemiology and modelling. This was a subgroup of the Scientific Advisory Group for Emergencies.
These contributions typically fell into two main categories:
- Reports in response to specific questions from the Secretariat (e.g. exploring implications of policy options).
- Reports or preliminary results detailing broader epidemiological insights about COVID-19 my colleagues and I thought were noteworthy (e.g. unusual patterns with novel variants).
This post focuses on the analysis reports that I made a major contribution to as a member of SPI-M-O in the first 18 months after COVID was identified as a threat, i.e. between Jan 2020 and July 2021 (a full list has previously been published). Narrative reports or analysis that did not involve substantial analytics or modelling (e.g. just direct plots of data) are not included here. I also include some major early piece of analysis that did not form SPI-M-O reports, but were published and informed subsequent analysis and modelling.
If another pandemic were to hit, how easily and quickly could we do these analyses again? To document where we currently are, it is important to understand where potential gaps and bottlenecks are. For each report, I therefore review three main criteria:
- Code availability: is the original analysis code public? (With link if relevant, or context if not.)
- Package availability: is the analysis code or underlying method currently packaged up for easy reuse?
- Task readiness time: roughly how long would it take to get the code or package into a rough state where it could re-run an equivalent analysis using the characteristics of a future transmissible H5N1 influenza? And how long to do so while also following robust best practice Epiverse-TRACE development principles, so others can easily build on the analysis? Would it take minutes (i.e. possible to run immediately), hours, days or weeks to get the basic functionality working?
Criteria (1) and (2) are either marked as available (✅), not available (❌), or partially available (⏳). Time taken is also divided into ‘rough’ (rapid, imperfect code to deliver a task) and ‘robust’ (i.e. best practice development for future re-use) to reflect wider discussions about what constitutes ‘good enough’ work in a pandemic. I also suggest some areas for potential further development, or links to ongoing development that will enable easier completion of tasks in future. The focus of the post tends to be packages that sit on the CMMID, Epiverse-TRACE, Epiforecasts or my GitHub repositories, because these were most directly related to the tasks being discussed, but some wider packages are also signposted.
This is not an exclusive list of work performed by myself and colleagues in the Centre for Mathematical Modelling of Infectious Diseases at LSHTM; there is a larger CMMID repository of real-time work, as well as a large volume of published papers in academic journals and reports on the gov.uk website. However, I hope this initial post can provide a useful summary of tasks that were performed, and framework for evaluating wider efforts required for a pandemic.
The overall effectiveness of the UK response, and which areas could be strengthened, are topics currently being examined by the COVID Inquiry, and will not be covered here. If readers are interested, there are some broader reflections from members of the UK modelling community, and recommendations for improvements, in Sherratt et al, 2024, Wellcome Open Research.
< section id="jan-2020-early-estimation-of-transmissibility-and-control" class="level3">28 Jan 2020: Early estimation of transmissibility and control
Code ✅ | Package ⏳ | Days (rough), Weeks (robust)
This analysis in early 2020 focused on estimation of transmissiblity and subsequent effect of lockdown control measures, and brought together reported cases in China, exported cases identified internationally, and infections detected on evacuation flights. A stochastic SEIR model was fitted with sequential Monte Carlo to estimate how
The code used to generate estimates for the probability of a large outbreak given
Suggested or ongoing development: There are now much more efficient methods for performing the main real-time inference analysis, particularly the {dust} toolkit in combination with {mcstate}. As future tools will have to be flexible enough to be applicable in a wide range of modelling scenarios with different data sources, the main ongoing task is to ensure these are well documented and have been tested with relevant examples that can serve as templates for future work. The {seir} package is collating a library of simple model fitting implementations and a branch has implemented estimation of a fixed
2 Feb 2020: Early analysis of contact tracing effectiveness
Code ✅ | Package ✅ | Hours (rough), Days (robust)
Another early analysis was a paper for SPI-M-O: Feasibility of controlling 2019-nCoV outbreaks by isolation of cases and contacts, later published as Hellewell et al, Lancet Global Health (2020). It used a branching process model to explore how transmission (e.g.
Suggested or ongoing development: There are several issues on the {ringbp} package repo that, once complete, would allow for faster implementation for new pathogens, especially in combination with {epiparameter}. The {epi.branch.sim} package, which is based on the Hellewell et al paper, also offers an arguably more developed package for plug-and-play analysis in the meantime. The {epichains} package also allows for estimation of simpler branching processes (i.e. without targeted control like contact tracing).
< section id="mar-2020-early-estimation-of-severity" class="level3">3 Mar 2020: Early estimation of severity
Code ✅ | Package ✅ | Hours (rough), Days (robust)
This analysis estimated the infection and case fatality ratio by age for COVID-19 using age-adjusted data from the outbreak on the Diamond Princess cruise ship. Later published as Russell et al, Eurosurveillance, 2020.
The methods used for this analysis are now implemented as the Epiverse-TRACE {cfr} package. There is also a ‘how to’ example for age-specific CFR estimation.
Suggested or ongoing development: There are several GitHub issues open that aim to strengthen {cfr}, especially for edge cases (e.g. very high CFR) or uncertainty when estimating underascertainment.
< section id="mar-2020-pre-covid-social-mixing-patterns" class="level3">11 Mar 2020: pre-COVID social mixing patterns
Data ✅ | Package ❌ | Hours (rough), Days (robust)
Another early paper for SPI-M-O focused on social mixing patterns in the UK from the 2017/18 BBC public science project: Some results from the BBC project on contact rates by context and age, later expanded into Klepac et al, MedRxiv, 2020. The underlying contact matrices were made available alongside the paper.
Suggested or ongoing development: It would be useful to incorporate data into {socialmixr} to enable easy re-use in R. There is also an issue to add eigenvector calculation, alongside the
3 Mar 2020 onwards: Early population-level scenarios
Code ✅ | Package ✅ | Hours (rough), Days (robust)
A collection of population-level scenario modelling reports generated between February and April 2020 was later published as a summary paper, Davies et al, Lancet Public Health, 2020. The original model, known as covidm
, had a code base that would be reused for multiple epidemic waves, including novel variants. However, many the basic scenarios can be now be explored using the Epiverse-TRACE {epidemics}. In particular, {epidemics} can simulate scenarios with multiple overlapping interventions targeting different age groups, and reflect uncertainty in
Some further reflections on specific pieces of COVID analysis for SPI-M-O in 2020 are listed below:
- 3 March. Effect of social distancing measures on deaths & peak demand for hospitals in England. Suggested or ongoing development: Although {epidemics} is set up to simulate infections, there is currently no database for burden to draw on (e.g. easy conversion to hospitalisations and deaths). The WHO Hub simex model has some functions for converting COVID cases to disease outcomes, but this could be a useful feature to include for a wider range of pathogens. Burden estimation is also planned in the Epiverse-TRACE late stage tutorials.
- 8 March. Estimating the impact of regional triggers for COVID-19 non-pharmaceutical interventions.
- 11 March. Impact of agressively managing peak incidence.. Suggested or ongoing development: This analysis included epidemic-dependent triggers (e.g. close venues once hospitalisations hit a certain level), which are not yet included in {epidemics}.
- 11 March. Impact of banning sporting events and other leisure activities on Covid-19 epidemic.
- 17 March. Impact of adding school closure to other social distance measures.
- 1 April. Impact of relaxing lockdown measures.
- 3 April. Modelling scenarios for relaxation of lockdown in England.
Suggested or ongoing development: There is work in progress with {epidemics} to allow a more flexible and editable {odin} back end, in case different features are required in future. This will have the advantage of combining the plug-and-play ability to rapidly define age-specific contact structure, demography, parameter uncertainty and overlapping interventions using {epidemics} syntax with a fast and adaptable {odin} simulation model.
< section id="detailed-contact-tracing-analysis" class="level3">Detailed contact tracing analysis
Code ✅ | Package ✅ | Days (rough), Weeks (robust)
This collection of individual-level testing and contact tracing modelling reports, which made use of the BBC social mixing data, was later published as a summary paper, Kucharski et al, Lancet Inf Dis, 2020. The original model was in R, and was later converted into a Python library as part of the Royal Society Delve initiative, feeding in to follow up analysis. This was a rapidly developed bespoke model with multiple types of contact (e.g. home, school, work, other) and an approximated transmission dynamic rather than full simulation (i.e. if 50% of infectious contacts are traced half-way through their likely infectious period, it would cut
Suggested or ongoing development: For a future epidemic, it may be more useful to merge these concepts into two types of tool, building on succesful outputs for COVID: 1) a ‘ready reckoners’ method that shows very intuitively how contact changes influence overall
Some further reflections on specific analysis reports for SPI-M-O are below:
- 16 April. Using BBC Pandemic data to model the impact of isolation, testing, contact tracing and physical distancing on reducing transmission of COVID-19 in different settings.
- 20 April. Effectiveness of isolation, testing, contact tracing and physical distancing on reducing transmission of COVID-19 in different settings.
- 21 April. The possible impact of mask wearing outside the home on the transmission of COVID-19. Suggested or ongoing development: This blanket measure could be explored using {ringbp} or {epichains} in their current form, e.g. this interventions vignette.
- 26 April. Estimated impact of delay from isolation of symptomatic case to test result and quarantine of contacts.
- 30 April. Estimated impact of testing quarantined contacts at different points in time. Suggested or ongoing development: Certain key aspects of this could be explored using {ringbp}, or {epinetwork}, particularly in terms of proportion of population quarantined at a given point in time.
3 June 2020: Analysis of superspreading
Code ❌ | Package ✅ | Minutes (robust)
This paper for SPI-M-O, Analysis of SARS-CoV-2 transmission clusters and superspreading events, provided different metrics to summarise the superspreading features of SARS-CoV-2. These functions are now in the {superspreading} package, with examples in the Epiverse-TRACE training.
< section id="june-2020-analysis-of-forwards-and-backwards-tracing" class="level3">10 June 2020: Analysis of forwards and backwards tracing
Code ✅ | Package ❌ | Hours (rough), Days (robust)
This paper for SPI-M-O, Branching process modelling of effectiveness of forward and backward tracing for SARS-CoV-2 control was later published as Endo et al, Wellcome Open Res, 2020.
Suggested or ongoing development: Although the underlying model isn’t in a package, the analysis is featured in the Epiverse-TRACE training. The core insight is also a relatively simple equation, i.e. that backward tracing would be expected to identify
14 Oct 2020: Testing and contact tracing in a real-world network
Code ✅ | Package ✅ | Days (rough), Weeks (robust)
This paper for SPI-M-O, Modelling effectiveness of TTI and physical distancing in controlling SARS-CoV-2 in high and low prevalence communities, based on UK contact network data built on the Firth et al, Nature Med, 2020. This used BBC contact network data from Haslemere to investigate interventions in clustered networks, leading to a package {covidhm} that built on {ringbp}. This package was subsequently also used for analysis of outbreak dynamics on Singapore test cruises.
Suggested or ongoing development: Once {ringbp} is stable, there is scope to expand to include the above functionality with the placeholder {epinetwork} package, which is a fork of the more specific {covidhm} implementationn.
< section id="oct-dec-2020-strategies-for-pcr-and-lateral-flow-testing" class="level3">Oct-Dec 2020: Strategies for PCR and lateral flow testing
The below papers for SPI-M-O used data on PCR and lateral flow performance to investigate different testing strategies.
21 Oct. Modelling frequent testing using PCR and lateral flow based on detection probabilities estimated from regular testing of health care workers. This paper used testing data from UCLH to infer the probability of test positivity post infection. It would later be published as Hellewell et al, BMC Medicine, 2021.
Code ✅ | Package ⏳ | Days (rough), Weeks (robust)
Suggested or ongoing development: This analysis focused on test positivity as an outcome, tailored to data available from the UCLH study (PCR + paired serology) but a more detailed framework could use Ct data as well, such as the codebase for LEGACY Ct modelling and {epikinetics} package currently in progress for antibody kinetics (which could also be adapted to other biological timescales.
2 Dec. Estimating detection of infection among household gathering attendees based on one-off pre-gathering lateral flow tests. This paper used posteriors from the above analysis to explore different testing scenarios for family gatherings.
Code ❌ | Package ❌ | Minutes (rough), Days (robust)
Suggested or ongoing development: The underlying equations in this analysis are quite simple (i.e. no more than a few lines of code), but could form a useful helper package. James Hay also built a somewhat related Shiny app (code here) linked to an accompanying paper about intution behind testing performance.
< section id="mar-2021-potential-for-herd-immunity-against-the-alpha-variant" class="level3">9 Mar 2021: Potential for herd immunity against the Alpha variant
Code ✅ | Package ❌ | Minutes (rough), Hours (robust)
This paper for SPI-M-O looked at the potential for vaccination-induced herd immunity against SARS-CoV-2, based on
Suggested or ongoing development: The basic calculation was relatively simple (
May-Jun 2021: Transmission dynamics of the Delta variant
Code ✅ | Package ❌ | Days (rough), Weeks (robust)
This collection of reports for SPI-M-O/SAGE analysed the transmission dynamics of the B.1.617.2 (Delta) variant in the UK, untangling imported infections from community transmission. The real-time model was coded in R, with fitting via MLE and then MCMC (as parameter space increased), with a stan prototype of the model also developed by Sam Abbott.
- 12 May. Modelling importations and local transmission of B.1.617.2 in the UK.
- 18 May. Dynamics of B.1.617.2 in the UK from importations, traveller-linked and non-traveller-linked transmission, 18 May 2021.
- 25 May. Dynamics of B.1.617.2 in the UK from importations, traveller-linked and non-traveller linked transmission, 25 May 2021.
- 01 Jun. Dynamics of B.1.617.2 in England NHS regions from importations, traveller-linked and non-traveller-linked transmission.
- 08 Jun. Dynamics of Delta variant in England NHS regions from importations, traveller-linked and non-traveller-linked transmission.
Suggested or ongoing development: The current code base is tailored to COVID-19, but the broader issue of distinguishing external importations (which may be known to some extent, e.g. based on timing of travel ban) from domestic human-to-human transmission also comes up for infections like avian influenza and mpox. If cases can be disaggregated into imported or domestic origin, then {EpiEstim} can calculate domestic transmission based on these data (Thompson et al, Epidemics, 2019). However,
1 Jun 2021: Analysis of social contact data during reopening
Code ✅ | Package ⏳ | Hours (rough), Weeks (robust)
Two papers for SPI-M-O/SAGE looked at social contact dynamics during reopening:
- Analysis of individuals with a high number of contacts in the CoMix study.
- Reconstructing the secondary case distribution of SARS-CoV-2 from heterogeneity in viral load trajectories and social contacts.
CoMix data are now available as part of the {socialmixr} package, with code for the secondary case distribution released with the Chapman et al pre-print.
Suggested or ongoing development: The methods used for ‘first principles’ reconstruction of
Concluding thoughts
Code was generally made available alongside public reports by LSHTM during COVID-19, except for very simple calculations. Several key functions are now available in R packages developed since the emergence of COVID-19, turning tasks that would have taken days into tasks that require only hours, but there are still some remaining bottlenecks to ensure that the methods would be applicable easily to H5N1, meaning some customised tasks for pandemic influenza would take hours when – with some further refinement – they could take minutes.
Reuse
< section class="quarto-appendix-contents" id="quarto-citation">Citation
@online{kucharski2024, author = {Kucharski, Adam}, title = {How Well Prepared Are We to Rapidly Analyse a New Influenza Pandemic? {A} Brief Perspective on Analysis Conducted for {UK} Government Advisory Groups During {COVID-19}}, date = {2024-12-23}, url = {https://epiverse-trace.github.io/posts/covid-analysis/}, langid = {en} }
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.