[This article was first published on SAS and R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Microsoft Excel is an awkward tool for data analysis. However, it is a reasonable environment for recording and transfering data. In our consulting practice, people frequently send us data in .xls (from Excel 97-2003) or .xlsx (from Excel 2007 or 2010) formatted files.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In order to use the data in statistical software, you have to get it out of Excel. While Excel does provide some tools for exporting data, these are not easily replicable, since they rely on menu choices through the GUI. A better approach is to read the file directly from within the statistical software.
An additional complication is that a single file may contain several sheets, each of which may have unique columns and rows. While importing from Excel into SAS is shown in section 1.1.5, we don’t discuss reading from specific sheets or show how to read an Excel file in R.
SAS
In SAS, it’s possible to use the “Import Data” wizard to gain access via the GUI (File; Import Data; etc.) but this is no better than using the GUI in Excel. However, all the wizard does is compose a proc import to read from the file. This turns out to be important, because the documentation for using proc import for Excel files is hard to find.
The documentation is buried in the on-line help at SAS Products; SAS/ACCESS; SAS/ACCESS 9.2 for PC Files: Reference; File Format-Specific Reference; Microsoft Excel Workbook Files. This is not the same material found through SAS Products; SAS Procedures; Proc Import. The code below was derived by running the wizard and using its option to save the resulting commands. The help.xlsx file can be downloaded from the book website; SAS currently cannot read an Excel file in directly from a URL– you must download the file manually and read it locally.
PROC IMPORT OUT= WORK.test DATAFILE= "C:\temp\help.xlsx" DBMS=EXCEL REPLACE; RANGE="help.csv"; GETNAMES=YES; MIXED=NO; SCANTEXT=YES; USEDATE=YES; SCANTIME=YES; RUN;
The range option can be used to specify the desired sheet. This means that you must know the name of the sheet you want to import and type it in. The trailing “$” can be used to read in just a specific range of cells, but with no additional information it implies the full sheet. Since there is only one sheet in the helpdata.xlsx file, it’s not required here.
R
The foreign package reads data from many file types. However, the .xls and .xlsx formats are not among them, as far as we know. Fortunately, the files can be read with the gdata package, using the read.xls() function. This will read files in either the .xls or the .xlsx format.
library(gdata) ffdata = read.xls("http://www.math.smith.edu/r/data/help.xlsx", sheet=1)
In this implementation, you specify the sheet by number, rather than name. This may be less precise than using the full name, but it does spare some tedious typing.
To leave a comment for the author, please follow the link and comment on their blog: SAS and R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.