Site icon R-bloggers

New Package — {cdccovidview} — To Work with the U.S. CDC’s New COVID-19 Trackers: COVIDView and COVID-NET

[This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The United States Centers for Disease Control (CDC from now on) has setup two new public surveillance resources for COVID-19. Together, COVIDView and COVID-NET provide similar weekly surveillance data as FluView does for influenza-like illnesses (ILI).

The COVIDView resources are HTML tables (O_O) and, while the COVID-NET interface provides a “download” button, there is no exposed API to make it easier for the epidemiological community to work with these datasets.

Enter {cdccovidview} — https://cinc.rud.is/web/packages/cdccovidview/ — which scrapes the tables and uses the hidden API in the same way {cdcfluview}(https://cran.rstudio.com/web/packages/cdcfluview/index.html) does for the FluView data.

Weekly case, hospitalization, and mortality data is available at the national, state and regional levels (where provided) and I tried to normalize the fields across each of the tables/datasets (I hate to pick on them when they’re down, but these two sites are seriously sub-optimal from a UX and just general usage perspective).

After you follow the above URL for information on how to install the package, it should “just work”. No API keys are needed, but the CDC may change the layout of tables and fields structure of the hidden API at any time, so keep an eye out for updates.

Using it is pretty simple, just use one of the functions to grab the data you want and then work with it.

library(cdccovidview)
library(hrbrthemes)
library(tidyverse)

hosp <- laboratory_confirmed_hospitalizations()

hosp
## # A tibble: 4,590 x 8
##    catchment      network   year  mmwr_year mmwr_week age_category cumulative_rate weekly_rate
##    <chr>          <chr>     <chr> <chr>     <chr>     <chr>                  <dbl>       <dbl>
##  1 Entire Network COVID-NET 2020  2020      10        0-4 yr                   0           0  
##  2 Entire Network COVID-NET 2020  2020      11        0-4 yr                   0           0  
##  3 Entire Network COVID-NET 2020  2020      12        0-4 yr                   0           0  
##  4 Entire Network COVID-NET 2020  2020      13        0-4 yr                   0.3         0.3
##  5 Entire Network COVID-NET 2020  2020      14        0-4 yr                   0.6         0.3
##  6 Entire Network COVID-NET 2020  2020      15        0-4 yr                  NA          NA  
##  7 Entire Network COVID-NET 2020  2020      16        0-4 yr                  NA          NA  
##  8 Entire Network COVID-NET 2020  2020      17        0-4 yr                  NA          NA  
##  9 Entire Network COVID-NET 2020  2020      18        0-4 yr                  NA          NA  
## 10 Entire Network COVID-NET 2020  2020      19        0-4 yr                  NA          NA  
## # … with 4,580 more rows

c(
  "0-4 yr", "5-17 yr", "18-49 yr", "50-64 yr", "65+ yr", "65-74 yr", "75-84 yr", "85+"
) -> age_f

mutate(hosp, start = mmwr_week_to_date(mmwr_year, mmwr_week)) %>%
  filter(!is.na(weekly_rate)) %>%
  filter(catchment == "Entire Network") %>%
  select(start, network, age_category, weekly_rate) %>%
  filter(age_category != "Overall") %>%
  mutate(age_category = factor(age_category, levels = age_f)) %>%
  ggplot() +
  geom_line(
    aes(start, weekly_rate)
  ) +
  scale_x_date(
    date_breaks = "2 weeks", date_labels = "%b\n%d"
  ) +
  facet_grid(network~age_category) +
  labs(
    x = NULL, y = "Rates per 100,000 pop",
    title = "COVID-NET Weekly Rates by Network and Age Group",
    caption = sprintf("Source: COVID-NET: COVID-19-Associated Hospitalization Surveillance Network, Centers for Disease Control and Prevention.\n<https://gis.cdc.gov/grasp/COVIDNet/COVID19_3.html>; Accessed on %s", Sys.Date())
  ) +
  theme_ipsum_es(grid="XY")

FIN

This is brand new and — as noted — things may change or break due to CDC site changes. I may have also missed a table or two (it’s a truly terrible site).

If you notice things are missing or would like a different interface to various data endpoints, drop an issue or PR wherever you’re most comfortable.

Stay safe!

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.