Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This post was updated on March 19, 2024 to reflect updates introduced in CDCPLACES 1.1.5.
< section id="introduction" class="level2">Introduction
To begin, we can install from CRAN, or from github, then load our packages.
# Install from CRAN # install.packages("CDCPLACES) # Install from Github # devtools::install_github("brendensm/CDCPLACES") library(CDCPLACES) library(dplyr) library(ggplot2)
Function: get_dictionary
Our first functions allows us to easily view what measures we can query, via ‘measureid’, along with a brief definition of each function. If we run get_dictionary
, a data frame is returned. We can view the measures in a data frame in the R Studio with View()
. This is the preferred method for exploring the available measures.
For our example here, I will print the names of the variables in this dataframe.
# To open a viewer # get_dictionary() %>% View() get_dictionary() %>% names()
[1] "measureid" "measure_full_name" [3] "measure_short_name" "categoryid" [5] "category_name" "places_release_2023" [7] "places_release_2022" "places_release_2021" [9] "places_release_2020" "_500_cities_release_2019" [11] "_500_cities_release_2018" "_500_cities_release_2017" [13] "_500_cities_release_2016" "frequency_brfss_year"
This data frame is useful for several reasons. It lists the available measures for each year of the CDC PLACES data, along with the data each variable was collected, all in a single place. Remember to use the measureid
when querying your data.
Function: get_places
This function allows us to easily query data that we specify. In the example below, I will get the measure ACCESS2
(the current lack of health insurance among adults aged 18-64 years) for the state of Arizona. This function allows for multiple of these arguments.
az_access <- get_places(state = "AZ", measure = "ACCESS2") head(az_access)
# A tibble: 6 × 21 year stateabbr statedesc locationname datasource category measure <chr> <chr> <chr> <chr> <chr> <chr> <chr> 1 2021 AZ Arizona Yuma BRFSS Prevention Current lack of … 2 2021 AZ Arizona Graham BRFSS Prevention Current lack of … 3 2021 AZ Arizona Apache BRFSS Prevention Current lack of … 4 2021 AZ Arizona La Paz BRFSS Prevention Current lack of … 5 2021 AZ Arizona Coconino BRFSS Prevention Current lack of … 6 2021 AZ Arizona Cochise BRFSS Prevention Current lack of … # ℹ 14 more variables: data_value_unit <chr>, data_value_type <chr>, # data_value <dbl>, low_confidence_limit <dbl>, high_confidence_limit <dbl>, # totalpopulation <chr>, locationid <chr>, categoryid <chr>, measureid <chr>, # datavaluetypeid <chr>, short_question_text <chr>, type <chr>, lon <dbl>, # lat <dbl>
It is also worth noting that by default geography
specifying geography is set to “county”. If instead we want to examine census tracts, we could specify the argument. Likewise, release
is set to “2023” by default.
The argument county
can be used to filter results to specific counties. This is extremely useful for examining census level data for specific areas of states. Additionally, geometry
can be added to include a shapefile in the query. For further examples of plotting with shapefiles, see this dedicated blog post.
cap_counties <- get_places(geography = "census", state = "MI", measure = "ACCESS2", county = c("Ingham", "Eaton", "Clinton"), geometry = TRUE)
Use Case
From here, we can start to have fun. It is fairly straight forward to begin exploring data. Here I will first filter out the data so that I can plot the age adjusted rates of lack of health insurance in Arizona.
Notice that the data provide you with confidence limits, so I have chosen to plot them here with error bars.
az_access %>% filter(datavaluetypeid == "AgeAdjPrv") %>% ggplot(aes(data_value, reorder(locationname, data_value))) + geom_point(size = 2) + geom_errorbar(aes(xmin = low_confidence_limit, xmax = high_confidence_limit)) + labs(title = "Lack of health insurance among adults aged 18-64 years In Arizona Counties", y = "", x = "Percent") + theme_minimal() + theme(plot.title.position = "plot")
You can also extend this to multiple states to compare. You can easily query two (or more) state names, and plot them. Arizona seems to have a couple of counties that have a much higher rate compared to others.
# multi state comparison two <- get_places(state = c("AZ", "NV"), measure = "ACCESS2") two %>% filter(datavaluetypeid == "AgeAdjPrv") %>% ggplot(aes(data_value, reorder(locationname, data_value), color = stateabbr)) + geom_point(size = 2) + geom_errorbar(aes(xmin = low_confidence_limit, xmax = high_confidence_limit)) + labs(title = "Lack of health insurance among adults aged 18-64 years In Arizona and Nevada", y = "Counties", x = "Percent") + theme_minimal() + theme(plot.title.position = "plot")
We can go even further by comparing more states in the region. Here I have taken the average rate by state to easily compare. Texas appears to be far above the average.
multi <- get_places(state = c("AZ", "NV", "NM", "TX", "CA"), measure = "ACCESS2") %>% filter(datavaluetypeid == "AgeAdjPrv") %>% summarise(.by = "stateabbr", mean_val = mean(data_value), mean_low = mean(low_confidence_limit), mean_high = mean(high_confidence_limit)) multi %>% ggplot(aes(mean_val, reorder(stateabbr, mean_val), color = stateabbr)) + geom_point(size = 2) + geom_errorbar(aes(xmin = mean_low, xmax = mean_high)) + labs(title = "Mean lack of health insurance among adults aged 18-64 years In Southwest States", y = "", x = "Percent") + theme_minimal() + theme(plot.title.position = "plot")
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.