Margin of Error by Geography in the American Community Survey (ACS)

Ari Lamstein

6 years ago

[This article was first published on R – AriLamstein.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Today I will demonstrate how the margin of error in American Community Survey (ACS) estimates grow as the size of the geography decreases. The final chart that we’ll create is this:

The way I interpret the above chart is this: The ACS is very confident about its state-level estimates. It’s a bit less confident about county-level estimates. But when you get down to tract and ZIP Code (ZCTA) level estimates, though, the margin of error sometimes increases sharply.

Why is the American Community Survey (ACS) Important?

I spend quite a bit of time on this blog talking about ACS data. The reasons are:

Availability: The census bureau has an API for the data, and Ezra Glenn has written an R package (acs) for getting the data into R.
Interest: The survey tells you all sorts of interesting information about Americans (race, income, education, etc.).
Importance: Over $400 billion in federal and state funds are allocated each year based on the survey (link).

So if you want to analyze an important, contemporary dataset for free, I think that exploring ACS data in R is a good place to start. Also, I have my own mapping project in R (choroplethr). And because ACS data has a geographic component, I can easily use it in my examples.

What does Margin of Error Mean?

Wikipedia does a better job of explaining it than me (link). But the gist is this: The ACS only interviews about 1% of US households each year. Because they are sampling the population, they do not have complete knowledge about the population. As a result, each reported value has both an estimated value and a margin of error. In this case, they are 90% sure that the estimated value lies within the reported estimate +- the marign of error.

The above boxplot shows that the margins of error for states and counties is low. But for tracts and zip codes there are sometimes outliers that are extremely high.

Implications for Mapping

My mapping package choroplethr is popular in part because of its support for ZIP Code maps maps. In fact, in my free course Learn to Map Census Data in R I teach people how to make maps like this:

library(choroplethrZip) # on github

data("df_zip_demographics")

df_zip_demographics$value = df_zip_demographics$per_capita_income
zip_choropleth(df_zip_demographics, 
               "San Francisco Metropolitan Statistical Area", 
               "Per-capita Income", 
               msa_zoom = "San Francisco-Oakland-Hayward, CA")

Like most choropleth maps, the colors are based on estimates alone. That is, the margins of error are thrown out. This might or might not be problematic – it really depends on how large the margins of error in these particular ZCTAs are.

New Version of Choroplethr

The latest version of choroplethr (3.5.0) makes it easier to get the 90% margin of error of ACS estimates. Function ?get_acs_data now has an optional parameter, include_moe, that defaults to FALSE. If set to TRUE, the returned data frame will include a column with the 90% margin of error:

> df_zip = get_acs_data("B19301", "zip", endyear = 2012, span = 5, include_moe = TRUE)[[1]]
Warning message:
NAs introduced by coercion 
> 
> head(df_zip)
  region value margin.of.error
1  00601  7283             561
2  00602  8247             750
3  00603  8913             489
4  00606  6025             692
5  00610  7786             398
6  00612 10203             614

Making the Boxplot

Now that we know how to get the margin of error of estimates, we can create the boxplot that began this post:

library(choroplethr) # version 3.5.0

# B19301 - Per Capita Income: Per capita income in the past 12 months (in 2012 inflation-adjusted dollars)

df_state = get_acs_data("B19301", "state", endyear = 2012, span = 5, include_moe = TRUE)[[1]]
df_state$region = NULL
df_state$geography = "State"

df_county = get_acs_data("B19301", "county", endyear = 2012, span = 5, include_moe = TRUE)[[1]]
df_county$region = NULL
df_county$geography = "County"

library(choroplethrZip) # on github
df_zip = get_acs_data("B19301", "zip", endyear = 2012, span = 5, include_moe = TRUE)[[1]]
df_zip$region = NULL
df_zip$geography = "ZIP (ZCTA)"

library(choroplethrCaCensusTract) # on github
library(acs)
df_ca_census_tract = get_ca_tract_acs_data("B19301", endyear = 2012, span = 5, include_moe = TRUE)[[1]]
df_ca_census_tract$region = NULL
df_ca_census_tract$geography = "Tract"

# combine and remove NAs
df = rbind(df_state, df_county, df_zip, df_ca_census_tract)
nrow(df) # 44371
df = na.omit(df)
nrow(df) # 43753
df$geography = factor(df$geography, levels = c("State", "County", "Tract", "ZIP (ZCTA)"))

library(ggplot2)
library(scales)

ggplot(df, aes(geography, margin.of.error)) +
geom_boxplot() +
    ggtitle("Margin of Error by Geographyn2012 5-Year ACSnPer Capita Income (B19301)") +
    labs(x="Geography", y="Margin of Error") +
    scale_y_continuous(labels=comma)

The post Margin of Error by Geography in the American Community Survey (ACS) appeared first on AriLamstein.com.

To leave a comment for the author, please follow the link and comment on their blog: R – AriLamstein.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.