Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Many public agencies release data in a fixed-format ASCII (FWF) format. But with the data all packed together without separators, you need a "data dictionary" defining the column widths (and metadata about the variables) to make sense of them. Unfortunately, many agencies make such information available only as a SAS script, with the column information embedded in a PROC IMPORT statement.
To solve this problem, R user Anthony Damico (who's also the energetic voice behind the excellent R Twotorials series) created the SAScii package. It parses the SAS script (or even an unstructured text instructions file, with a PROC IMPORT statement included), and uses that information to read the associated fix-format ASCII file. Just provide the name or URL of the script/instructions file and the data file, and SAScii does the rest. There are several examples in the read.SAScii helpfile, including the R command to read the following public data sets:
- 2009 Medical Expenditure Panel Survey Emergency Room Visits File (Medical Expenditure Panel Survey)
- 2010 National Health Interview Survey Persons file (CDC)
- IPUMS – American Community Survey Extract (Minnesota Population Center)
- 2008 Survey of Income and Program Participation Wave 1 (US Census)
Many thanks to Anthony for creating this package and pointing it out to me at the useR!2102 conference last month. It unlocks many useful public data sets for those of us without SAS licenses.
SAScii package: read.SAScii documentation
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.