Removing Personal Bias From Flu Severity Estimation (a.k.a. Misery Loves Data)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The family got hit pretty hard with the flu right as the Christmas festivities started and we were all pretty much bed-ridden zombies up until today (2017-01-02). When in the throes of a very bad ILI it’s easy to imagine that you’re a victim of a severe outbreak, especially with ancillary data from others that they, too, either just had/have the flu or know others who do. Thankfully, I didn’t have to accept this emotional opine and could turn to the cdcfluview
package to see just how this year is measuring up.
Influenza cases are cyclical, and that’s super-easy to see with a longer-view of the CDC national data:
library(cdcfluview) library(tidyverse) library(stringi) flu <- get_flu_data("national", years=2010:2016) mutate(flu, week=as.Date(sprintf("%s %s 1", YEAR, WEEK), format="%Y %U %u")) %>% select(-`AGE 25-64`) %>% gather(age_group, count, starts_with("AGE")) %>% mutate(age_group=stri_replace_all_fixed(age_group, "AGE", "Ages")) %>% mutate(age_group=factor(age_group, levels=c("Ages 0-4", "Ages 5-24", "Ages 25-49", "Ages 50-64", "Ages 65"))) %>% ggplot(aes(week, count, group=age_group)) + geom_line(aes(color=age_group)) + scale_y_continuous(label=scales::comma, limits=c(0,20000)) + facet_wrap(~age_group, scales="free") + labs(x=NULL, y="Count of reported ILI cases", title="U.S. National ILI Case Counts by Age Group (2010:2011 flu season through 2016:2017)", caption="Source: CDC ILInet via CRAN cdcfluview pacakge") + ggthemes::scale_color_tableau(name=NULL) + hrbrmisc::theme_hrbrmstr(grid="XY") + theme(legend.position="none")
We can use the same data to zoom in on this season:
mutate(flu, week=as.Date(sprintf("%s %s 1", YEAR, WEEK), format="%Y %U %u")) %>% select(-`AGE 25-64`) %>% gather(age_group, count, starts_with("AGE")) %>% mutate(age_group=stri_replace_all_fixed(age_group, "AGE", "Ages")) %>% mutate(age_group=factor(age_group, levels=c("Ages 0-4", "Ages 5-24", "Ages 25-49", "Ages 50-64", "Ages 65"))) %>% filter(week >= as.Date("2016-07-01")) %>% ggplot(aes(week, count, group=age_group)) + geom_line(aes(color=age_group)) + scale_y_continuous(label=scales::comma, limits=c(0,20000)) + facet_wrap(~age_group, scales="free") + labs(x=NULL, y="Count of reported ILI cases", title="U.S. National ILI Case Counts by Age Group (2016:2017 flu season)", caption="Source: CDC ILInet via CRAN cdcfluview pacakge") + ggthemes::scale_color_tableau(name=NULL) + hrbrmisc::theme_hrbrmstr(grid="XY") + theme(legend.position="none")
So, things are trending up, but how severe is this year compared to others? While looking at the number/percentage of ILI cases is one way to understand severity, another is to look at the mortality rate. The cdcfluview
package has a get_mortality_surveillance_data()
function, but it’s region-based and I’m really only looking at national data in this post. A helpful individual pointed out a new CSV file at https://www.cdc.gov/flu/weekly/index.htm#MS which we can reproducibly programmatically target (so we don’t have to track filename changes by hand) with:
library(rvest) pg <- read_html("https://www.cdc.gov/flu/weekly/index.htm#MS") html_nodes(pg, xpath=".//a[contains(@href, 'csv') and contains(@href, 'NCHS')]") %>% html_attr("href") -> mort_ref mort_url <- sprintf("https://www.cdc.gov%s", mort_ref) df <- readr::read_csv(mort_url)
We can, then, take a look at the current “outbreak” status (when real-world mortality events exceed the model threshold):
mutate(df, week=as.Date(sprintf("%s %s 1", Year, Week), format="%Y %U %u")) %>% select(week, Expected, Threshold, `Percent of Deaths Due to Pneumonia and Influenza`) %>% gather(category, percent, -week) %>% mutate(percent=percent/100) %>% ggplot() + geom_line(aes(week, percent, group=category, color=category)) + scale_x_date(date_labels="%Y-%U") + scale_y_continuous(label=scales::percent) + ggthemes::scale_color_tableau(name=NULL) + labs(x=NULL, y=NULL, title="U.S. Pneumonia & Influenza Mortality", subtitle="Data through week ending December 10, 2016 as of December 28, 2016", caption="Source: National Center for Health Statistics Mortality Surveillance System") + hrbrmisc::theme_hrbrmstr(grid="XY") + theme(legend.position="bottom")
That view is for all mortality events from both influenza and pneumonia. We can look at the counts for just influenza as well:
mutate(df, week=as.Date(sprintf("%s %s 1", Year, Week), format="%Y %U %u")) %>% select(week, `Influenza Deaths`) %>% ggplot() + geom_line(aes(week, `Influenza Deaths`), color=ggthemes::tableau_color_pal()(1)) + scale_x_date(date_labels="%Y-%U") + scale_y_continuous(label=scales::comma) + ggthemes::scale_color_tableau(name=NULL) + labs(x=NULL, y=NULL, title="U.S. Influenza Mortality (count of mortality events)", subtitle="Data through week ending December 10, 2016 as of December 28, 2016", caption="Source: National Center for Health Statistics Mortality Surveillance System") + hrbrmisc::theme_hrbrmstr(grid="XY") + theme(legend.position="bottom")
It’s encouraging that the overall combined mortality rate is trending downwards and that the mortality rate for influenza is very low. Go. Science.
I’ll be adding a function to cdcfluview
to retrieve this new data set a bit later in the year.
Hopefully you’ll avoid the flu and enjoy a healthy and prosperous 2017.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.