Introducing the CDCPLACES Package

Brenden Smith

4 months ago

[This article was first published on Brenden Smith, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post was updated on March 19, 2024 to reflect updates introduced in CDCPLACES 1.1.5.

< section id="introduction" class="level2">

Introduction

To begin, we can install from CRAN, or from github, then load our packages.

< details open="" class="code-fold"> < summary>Code

# Install from CRAN
# install.packages("CDCPLACES)

# Install from Github
# devtools::install_github("brendensm/CDCPLACES")

library(CDCPLACES)
library(dplyr)
library(ggplot2)

< section id="function-get_dictionary" class="level2">

Function: `get_dictionary`

Our first functions allows us to easily view what measures we can query, via ‘measureid’, along with a brief definition of each function. If we run get_dictionary, a data frame is returned. We can view the measures in a data frame in the R Studio with View(). This is the preferred method for exploring the available measures.

For our example here, I will print the names of the variables in this dataframe.

< details open="" class="code-fold"> < summary>Code

# To open a viewer
# get_dictionary() %>% View()

get_dictionary() %>% names()

 [1] "measureid"                "measure_full_name"       
 [3] "measure_short_name"       "categoryid"              
 [5] "category_name"            "places_release_2023"     
 [7] "places_release_2022"      "places_release_2021"     
 [9] "places_release_2020"      "_500_cities_release_2019"
[11] "_500_cities_release_2018" "_500_cities_release_2017"
[13] "_500_cities_release_2016" "frequency_brfss_year"

This data frame is useful for several reasons. It lists the available measures for each year of the CDC PLACES data, along with the data each variable was collected, all in a single place. Remember to use the measureid when querying your data.

< section id="function-get_places" class="level2">

Function: `get_places`

This function allows us to easily query data that we specify. In the example below, I will get the measure ACCESS2 (the current lack of health insurance among adults aged 18-64 years) for the state of Arizona. This function allows for multiple of these arguments.

< details open="" class="code-fold"> < summary>Code

az_access <- get_places(state = "AZ", 
                        measure = "ACCESS2") 
head(az_access)

# A tibble: 6 × 21
  year  stateabbr statedesc locationname datasource category   measure          
  <chr> <chr>     <chr>     <chr>        <chr>      <chr>      <chr>            
1 2021  AZ        Arizona   Yuma         BRFSS      Prevention Current lack of …
2 2021  AZ        Arizona   Graham       BRFSS      Prevention Current lack of …
3 2021  AZ        Arizona   Apache       BRFSS      Prevention Current lack of …
4 2021  AZ        Arizona   La Paz       BRFSS      Prevention Current lack of …
5 2021  AZ        Arizona   Coconino     BRFSS      Prevention Current lack of …
6 2021  AZ        Arizona   Cochise      BRFSS      Prevention Current lack of …
# ℹ 14 more variables: data_value_unit <chr>, data_value_type <chr>,
#   data_value <dbl>, low_confidence_limit <dbl>, high_confidence_limit <dbl>,
#   totalpopulation <chr>, locationid <chr>, categoryid <chr>, measureid <chr>,
#   datavaluetypeid <chr>, short_question_text <chr>, type <chr>, lon <dbl>,
#   lat <dbl>

It is also worth noting that by default geography specifying geography is set to “county”. If instead we want to examine census tracts, we could specify the argument. Likewise, release is set to “2023” by default.

The argument county can be used to filter results to specific counties. This is extremely useful for examining census level data for specific areas of states. Additionally, geometry can be added to include a shapefile in the query. For further examples of plotting with shapefiles, see this dedicated blog post.

< details open="" class="code-fold"> < summary>Code

cap_counties <- get_places(geography = "census",
                           state = "MI",
                           measure = "ACCESS2",
                           county = c("Ingham", "Eaton", "Clinton"),
                           geometry = TRUE)

< section id="use-case" class="level2">

Use Case

From here, we can start to have fun. It is fairly straight forward to begin exploring data. Here I will first filter out the data so that I can plot the age adjusted rates of lack of health insurance in Arizona.

Notice that the data provide you with confidence limits, so I have chosen to plot them here with error bars.

< details open="" class="code-fold"> < summary>Code

az_access %>%
  filter(datavaluetypeid == "AgeAdjPrv") %>%
  ggplot(aes(data_value, reorder(locationname, data_value))) +
  geom_point(size = 2) +
  geom_errorbar(aes(xmin = low_confidence_limit, xmax = high_confidence_limit)) +
  labs(title = "Lack of health insurance among adults aged 18-64 years In Arizona Counties",
       y = "", x = "Percent") +
  theme_minimal() +
  theme(plot.title.position = "plot")

You can also extend this to multiple states to compare. You can easily query two (or more) state names, and plot them. Arizona seems to have a couple of counties that have a much higher rate compared to others.

< details open="" class="code-fold"> < summary>Code

# multi state comparison
two <- get_places(state = c("AZ", "NV"), 
                  measure = "ACCESS2")

two %>%
  filter(datavaluetypeid == "AgeAdjPrv") %>%
  ggplot(aes(data_value, reorder(locationname, data_value), color = stateabbr)) +
  geom_point(size = 2) +
  geom_errorbar(aes(xmin = low_confidence_limit, xmax = high_confidence_limit)) +
  labs(title = 
         "Lack of health insurance among adults aged 18-64 years In Arizona and Nevada",
       y = "Counties", x = "Percent") +
  theme_minimal() +
  theme(plot.title.position = "plot")

We can go even further by comparing more states in the region. Here I have taken the average rate by state to easily compare. Texas appears to be far above the average.

< details open="" class="code-fold"> < summary>Code

multi <- get_places(state = c("AZ", "NV", "NM", "TX", "CA"), measure = "ACCESS2") %>%
  filter(datavaluetypeid == "AgeAdjPrv") %>%
  summarise(.by = "stateabbr", mean_val = mean(data_value), mean_low = mean(low_confidence_limit), mean_high = mean(high_confidence_limit))

multi %>%
  ggplot(aes(mean_val, reorder(stateabbr, mean_val), color = stateabbr)) +
  geom_point(size = 2) +
  geom_errorbar(aes(xmin = mean_low, xmax = mean_high)) +
  labs(title = "Mean lack of health insurance among adults aged 18-64 years In Southwest States",
       y = "", x = "Percent") +
  theme_minimal() +
  theme(plot.title.position = "plot")

To leave a comment for the author, please follow the link and comment on their blog: Brenden Smith.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Function: get_dictionary

Function: get_places

Use Case

Related

Function: `get_dictionary`

Function: `get_places`