[This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
the american time use survey collects information about how we spend our time. it’s a pretty simple setup: sampled individuals write down everything they do for a single twenty-four hour period, in ten minute intervals. those diaries are averaged across all respondents, and we end up with results like this genius nyt visualization by amanda cox. most economists use atus to study uncompensated work (chores and childcare), but you can use it for all sorts of crazy stuff like learning that even in the dead of night, one-twentieth of us are awake. or that we average 54 seconds of sex every day. i cannot think of anything i would rather be doing than analyzing this survey dataset.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
before you start crosstabbing and svymeaning, it’d be smart to spend ten minutes reading exhibit 6.2 of the user’s guide so you understand how all the data tables (..that the download automation script imports for you..) work together. simpler analyses might only require the respondent and activity summary files, but at the point you want to determine who was with the respondent at soccer practice, you had better merge like a champ. of course before any of that, you’ll need to decide which activity codes you actually want to capture. time spent calf-roping or cattle-riding? code 130121. commuting to the vet? code 180807. pumping gas? 070102. tired of me guessing for you? check out the activity coding lexicons. this new github repository contains four scripts:
download all microdata.R
- decipher the bls ftp site to download each year-specific (or multi-year) table
- unzip whatcha need, then import the microdata in a jiffy with read.csv
- save each file as an r data file (.rda) into neatly-sorted atus directories
2012 single-year – analysis examples.R
- load the activity, respondent, roster, and replicate weights files into working memory
- aggregate activity events to the respondent by top tier-level, then reshape it into one-record-per-person
- convert minutes to hours, merge all files into one data.frame, recode a smidgen
- create a replicate-weighted survey design object, with the bls-specified fay’s adjustment
- perform one fine slew of analysis examples, including quite a few of these bls statistics
replicate bls standard error – 2007.R
- load the activity, activity-summary, respondent, and replicate weights files into working memory
- subset the activity summary table to only the television-related events table
- aggregate the activity table to the respondent-level as an example of an alternative to the previous method
- merge the minutes-spent-watching-television table with the respondent and replicate weights tables
- create a replicate-weighted survey design object, with the bls-specified fay’s adjustment
- precisely replicate the bureau of labor statistics’ standard error of hours per day spent watching the teevee
replicate bls example one – 2006.R
- load the activity and respondent data tables into working memory
- subset the activity table to only care of household children events (as prescribed by the 2006 lexicon)
- aggregate that activity table to the respondent-level, then merge those minutes to the respondent data.
- just run a weighted.mean that skips any variance calculation but hits the bls example one on the nose
click here to view these four scripts
for more detail about the american time use survey, visit:
- the questionnaire, transmogrified for public dissemination
- summary charts and tables provided by the bureau of labor statistics
notes:
just like the medical expenditure panel survey draws its sample from the national health interview survey, the american time use survey is a subsample of current population survey (cps) respondents. in fact, the microdata include a handy atus-cps mergefile. unlike the cps, it’s not a household survey – only one individual at least 15 years of age gets selected from each sampled household. another important difference from the cps: the atus should not be used to draw state-level conclusions. atus generalizes to the united states non-institutional, non-active duty military population aged fifteen or more, but don’t zoom in on geographies smaller than census regions.
when you see the svytotal function used in the analysis example script, you’ll notice overall sums around ninety billion. that’s because the survey weights in this data set actually generalize to person-days. divide by 365, and you’ll almost precisely hit the `sixteen and older` row of the `2010 column` of table 1 on this census bureau age by sex table. so at ease, everybody. at ease.
confidential to sas, spss, stata, and sudaan users: if you want to impress people at parties with an antiquated skill, learn morse code. at least it’s rhythmic. time to transition to r. 😀
To leave a comment for the author, please follow the link and comment on their blog: asdfree by anthony damico.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.