[This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
the annual march cps-asec has been supplying the statistics for the census bureau’s report on income, poverty, and health insurance coverage since 1948. wow. the us census bureau and the bureau of labor statistics (bls) tag-team on this one. until the american community survey (acs) hit the scene in the early aughts (2000s), the current population survey had the largest sample size of all the annual general demographic data sets outside of the decennial census – about two hundred thousand respondents. this provides enough sample to conduct state- and a few large metro area-level analyses. your sample size will vanish if you start investigating subgroups by state – consider pooling multiple years. county-level is a no-no.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
despite the american community survey’s larger size, the cps-asec contains many more variables related to employment, sources of income, and insurance – and can be trended back to harry truman’s presidency. aside from questions specifically asked about an annual experience (like income), many of the questions in this march data set should be treated as point-in-time statistics. cps-asec generalizes to the united states non-institutional, non-active duty military population.
the national bureau of economic research (nber) provides sas, spss, and stata importation scripts to create a rectangular file (rectangular data means only person-level records; household- and family-level information gets attached to each person). to import these files into r, the parse.SAScii function uses nber’s sas code to determine how to import the fixed-width file, then RSQLite to put everything into a schnazzy database. you can try reading through the nber march 2012 sas importation code yourself, but it’s a bit of a proc freak show. this new github repository contains three scripts:
2005-2012 asec – download all microdata.R
- download the fixed-width file containing household, family, and person records
- import by separating this file into three tables, then merge ’em together at the person-level
- download the fixed-width file containing the person-level replicate weights
- merge the rectangular person-level file with the replicate weights, then store it in a sql database
- create a new variable – one – in the data table
2012 asec – analysis examples.R
- connect to the sql database created by the ‘download all microdata’ program
- create the complex sample survey object, using the replicate weights
- perform a boatload of analysis examples
replicate census estimates – 2011.R
- connect to the sql database created by the ‘download all microdata’ program
- create the complex sample survey object, using the replicate weights
- match the sas output shown in the png file below
2011 asec replicate weight sas output.png
- statistic and standard error generated from the replicate-weighted example sas script contained in this census-provided person replicate weights usage instructions document.
click here to view these three scripts
for more detail about the current population survey – annual social and economic supplement (cps-asec), visit:
- the census bureau’s current population survey page
- the bureau of labor statistics’ current population survey page
- the current population survey’s wikipedia article
notes:
interviews are conducted in march about experiences during the previous year. the file labeled 2012 includes information (income, work experience, health insurance) pertaining to 2011. when you use the current population survey to talk about america, subract a year from the data file name.
as of the 2010 file (the interview focusing on america during 2009), the cps-asec contains exciting new medical out-of-pocket spending variables most useful for supplemental (medical spending-adjusted) poverty research.
confidential to sas, spss, stata, sudaan users: why are you still rubbing two sticks together after we’ve invented the butane lighter? time to transition to r. 😀
To leave a comment for the author, please follow the link and comment on their blog: asdfree by anthony damico.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.