[This article was first published on asdfree by anthony damico, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
experimental. the behavioral risk factor surveillance system (brfss) aggregates behavioral health data from 400,000 adults via telephone every year. it’s um *clears throat* the largest telephone survey in the world and it’s gotta lotta uses, here’s a list neato. state health departments perform the actual data collection (according to a nationally-standardized protocol and a core set of questions), then forward all responses to the centers for disease control and prevention (cdc) office of surveillance, epidemiology, and laboratory services (osels) where the nationwide, annual data set gets constructed. independent administration by each state allows them to tack on their own questions that other states might not care about. that way, florida could exempt itself from all the risky frostbite behavior questions. in addition to providing the most comprehensive behavioral health data set in the united states, brfss also eeks out my worst acronym in the federal government award – onchit a close second.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
annual brfss data sets have grown rapidly over the past half-decade: the 1984 data set contained only 12,258 respondents from 15 states, all states were participating by 1994, and the 2011 file has surpassed half a million interviews. if you’re examining trends over time, do your homework and review the brfss technical documents for the years you’re looking at (plus any years in between). what might you find? well for starters, the cdc switched to sampling cellphones in their 2011 methodology.
unlike many u.s. government surveys, brfss is not conducted for each resident at a sampled household (phone number). only one respondent per phone number gets interviewed. did i miss anything? well if your next question is frequently asked, you’re in luck.
all brfss files are available in sas transport format so if you’re sittin’ pretty on 16 gb of ram, you could potentially read.xport a single year and create a taylor-series survey object using the survey package. cool. but hear me out: the download and importation script builds an ultra-fast monet database (click here for speed tests, installation instructions) on your local hard drive. after that, these scripts are shovel-ready. consider importing all brfss files my way – let it run overnight – and during your actual analyses, code will run a lot faster. the brfss generalizes to the u.s. adult (18+) (non-institutionalized) population, but if you don’t have a phone, you’re probably out of scope. this new github repository contains four scripts:
1984 – 2011 download all microdata.R
- create the batch (.bat) file needed to initiate the monet database in the future
- download, unzip, and import each year specified by the user
- create and save the taylor-series linearization complex sample designs
- create a well-documented block of code to re-initiate the monetdb server in the future
2011 single-year – analysis examples.R
- run the well-documented block of code to re-initiate the monetdb server
- load the r data file (.rda) containing the taylor-series linearization design for the single-year 2011 file
- perform the standard repertoire of analysis examples, only this time using sqlsurvey functions
2010 single-year – variable recode example.R
- run the well-documented block of code to re-initiate the monetdb server
- copy the single-year 2010 table to maintain the pristine original
- add a new drinks per month category variable by hand
- re-create then save the sqlsurvey taylor-series linearization complex sample design on this new table
- close everything, then load everything back up in a fresh instance of r
- replicate statistics from this table, pulled from the cdc’s web-enabled analysis tool
replicate cdc weat – 2010.R
- run the well-documented block of code to re-initiate the monetdb server
- load the r data file (.rda) containing the taylor-series linearization design for the single-year 2010 file
- replicate statistics from this table, pulled from the cdc’s web-enabled analysis tool
click here to view these four scripts
for more detail about the behavioral risk factor surveillance system, visit:
- the centers for disease control and prevention behavioral risk factor surveillance system homepage
- the behavioral risk factor surveillance system wikipedia entry
notes:
if you’re just scroungin’ around for a few statistics, the cdc’s web-enabled analysis tool (weat) might be all your heart desires. in fact, on slides seven, eight, nine of my online query tools video, i demonstrate how to use this table creator. weat’s more advanced than most web-based survey analysis – you can run a regression. but only seven (of eighteen) years can currently be queried online.
since data types in sql are not as plentiful as they are in the r language, the definition of a monet database-backed complex design object requires a cutoff be specified between the categorical variables and the linear ones. that cut point gets defined using the check.factors argument in the sqlsurvey() and sqlrepsurvey() function calls. check.factors defaults to ten, but can be raised or lowered as needed. here’s how it works:
- if the column would be a character string or factor inside an r data frame, the sql database stores it as a varchar column.
- if the column would be numeric or integer in an r data frame, but has fewer than eleven unique values, the sql database also stores it as a varchar column.
- if the column would be numeric or integer in an r data frame, but has at least eleven unique values, the sql database stores it as a double (that’s sql-speak for numeric).
confidential to sas, spss, stata, sudaan users: when statistical languages are plotted on cartesian coordinates, what-you-paid-for vs. what-you-get are best represented as y = 1/x. time to transition to r. 😀
To leave a comment for the author, please follow the link and comment on their blog: asdfree by anthony damico.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.