Getting data for every Census tract in the US with purrr and tidycensus
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Interested in more tips on working with Census data? Click here to join my email list!
Last week, I published the development version of my new R package, tidycensus. You can read through the documentation and some examples at https://walkerke.github.io/tidycensus/. I’m working on getting the package CRAN-ready with better error handling; in the meantime, I’m sharing a few examples to demonstrate its functionality.
If you are working on a national project that includes demographic data as a component, you might be interested in acquiring Census tract data for the entire United States. However, Census tract data are commonly available by state (with the exception of NHGIS, which is a wonderful resource), meaning that an analyst would have to spend time piecing the data together.
tidycensus solves this problem directly within R with help from the purrr package, a member of the tidyverse. In tidycensus, there is a built-in data frame named fips_codes
that includes US state and county IDs; tidycensus uses this data frame to handle translations between state/county names and FIPS codes. However, this data frame can also be used to generate a vector of state codes to be fed to the map_df
function in purrr. As such, this is all it takes to get a tibble of total population estimates for all US Census tracts from the 2011-2015 ACS:
library(tidycensus) library(purrr) # Un-comment below and set your API key # census_api_key("YOUR KEY GOES HERE") us <- unique(fips_codes$state)[1:51] totalpop <- map_df(us, function(x) { get_acs(geography = "tract", variables = "B01003_001", state = x) }) str(totalpop) ## Classes 'tbl_df', 'tbl' and 'data.frame': 73056 obs. of 5 variables: ## $ GEOID : chr "01001020100" "01001020200" "01001020300" "01001020400" ... ## $ NAME : chr "Census Tract 201, Autauga County, Alabama" "Census Tract 202, Autauga County, Alabama" "Census Tract 203, Autauga County, Alabama" "Census Tract 204, Autauga County, Alabama" ... ## $ variable: chr "B01003_001" "B01003_001" "B01003_001" "B01003_001" ... ## $ estimate: num 1948 2156 2968 4423 10763 ... ## $ moe : num 203 268 404 493 624 478 436 281 1000 535 ...
Get any ACS or decennial Census data in this way.
However - what if you also want tract geometry for mapping? This only requires a few small modifications. map_df
in purrr uses the bind_rows
function under the hood, which doesn’t work with simple features objects (yet). However, sf does have an rbind
method that works for sf
objects and can be fed to purrr’s reduce
function.
library(sf) options(tigris_use_cache = TRUE) totalpop_sf <- reduce( map(us, function(x) { get_acs(geography = "tract", variables = "B01003_001", state = x, geometry = TRUE) }), rbind ) str(totalpop_sf) ## Classes 'sf' and 'data.frame': 72843 obs. of 6 variables: ## $ GEOID : chr "01003010500" "01003011501" "01009050500" "01015981901" ... ## $ NAME : chr "Census Tract 105, Baldwin County, Alabama" "Census Tract 115.01, Baldwin County, Alabama" "Census Tract 505, Blount County, Alabama" "Census Tract 9819.01, Calhoun County, Alabama" ... ## $ variable: chr "B01003_001" "B01003_001" "B01003_001" "B01003_001" ... ## $ estimate: num 5321 5771 7007 4 1607 ... ## $ moe : num 452 825 556 6 235 309 506 386 425 310 ... ## $ geometry:sfc_GEOMETRY of length 72843; first list element: List of 1 ## ..$ :List of 1 ## .. ..$ : num [1:55, 1:2] -87.8 -87.8 -87.8 -87.8 -87.8 ... ## ..- attr(*, "class")= chr "XY" "MULTIPOLYGON" "sfg" ## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA ## ..- attr(*, "names")= chr "GEOID" "NAME" "variable" "estimate" ... ## - attr(*, "sf_column")= chr "geometry"
By declaring geometry = TRUE
, tidycensus fetches tract feature geometry using the tigris package and merges it to the ACS data automatically for you. I recommend using the caching feature in the tigris package if you plan to use this workflow multiple times. You might note the discrepancy in tracts between the geometry-enabled and regular data frames; this is due to the removal of some water-only tracts in the cartographic boundary shapefiles used by tidycensus.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.