Site icon R-bloggers

analyze the consumer expenditure survey (ce) with r

[This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
the consumer expenditure survey (ce) is the primo data source to understand how americans spend money.  participating households keep a running diary about every little purchase over the year.  those diaries are then summed up into precise expenditure categories.  how else are you gonna know that the average american household spent $34 (±2) on bacon, $826 (±17) on cellular phones, and $13 (±2) on digital e-readers in 2011?  an integral component of the market basket calculation in the consumer price index, this survey recently became available as public-use microdata and they’re slowly releasing historical files back to 1996.  hooray!

for a taste of what’s possible with ce data, look at the quick tables listed on their main page – these tables contain approximately a bazillion different expenditure categories broken down by demographic groups.  guess what?  i just learned that americans living in households with $5,000 to $9,999 of annual income spent an average of $283 (±90) on pets, toys, hobbies, and playground equipment (pdf page 3).  you can often get close to your statistic of interest from these web tables.  but say you wanted to look at domestic pet expenditure among only households with children between 12 and 17 years old.  another one of the thirteen web tables – the consumer unit composition table – shows a few different breakouts of households with kids, but none matching that exact population of interest.  the bureau of labor statistics (bls) (the survey’s designers) and the census bureau (the survey’s administrators) have provided plenty of the major statistics and breakouts for you, but they’re not psychic.  if you want to comb through this data for specific expenditure categories broken out by a you-defined segment of the united states’ population, then let a little r into your life.  fun starts now.

fair warning: only analyze the consumer expenditure survey if you are nerd to the core.  the microdata ship with two different survey types (interview and diary), each containing five or six quarterly table formats that need to be stacked, merged, and manipulated prior to a methodologically-correct analysis.  the scripts in this repository contain examples to prepare ’em all, just be advised that magnificent data like this will never be no-assembly-required.  the folks at bls have posted an excellent summary of what’s available – read it before anything else.  after that, read the getting started guide.  don’t skim.

a few of the descriptions below refer to sas programs provided by the bureau of labor statistics.  you’ll find these in the C:\My Directory\CES\2011\docs directory after you run the download program.  this new github repository contains three scripts:

2009-2011 – download all microdata.R

2011 fmly intrvw – analysis examples.R

replicate integrated mean and se.R


click here to view these three scripts



for more detail about the consumer expenditure survey (ce), visit:

notes:

throughout this post, i’ve used the terms consumer unit and household interchangeably.  consumer unit is a precise definition, but household is a reasonable proxy that will make more sense to your audience.  the consumer expenditure survey is a consumer unit-level survey, meaning all weights and results generalize to the average (non-institutional) american consumer unit.  since the unit of analysis is one consumer unit rather than one person, it’s trickier to talk about your results.  instead of saying, “in 2011, the average american spent $x on y,” you’ll have to say, “in 2011, the average american household spent $y on z.”  if your boss frowns at you, blame it on me.

if you’re hard-pressed to talk about expenses at the individual-level, you could copy what the social security administration did on pdf page 11 of this report and compute per capita expenditures, but i don’t recommend it.  if you desperately need to open up that can of worms, run your analytic plan by the folks at bls and get their blessing.  otherwise, stick with household.  small price for wonderful data.

confidential to sas, spss, stata, and sudaan users: why are you still using medieval inventions?  you don’t see airplane pilots navigating by the stars.  time to transition to r.  😀

To leave a comment for the author, please follow the link and comment on their blog: asdfree by anthony damico.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.