[This article was first published on R in the Antipodes, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
When using R, we may need find our data has been saved in a different statistics package. While there are some export functions in other statistical software that will export to a different filetype, or we may simply use a .csv file, R can import some datasets from their native filetype.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
SPSS is one of those filetypes. SPSS datafiles have a .sav extension, and we can import these into R using the foreign package. This package is installed by default as part of the R core installation.
Ensure the foreign library is attached:
library(foreign)
There is a nifty trick to getting the filepath for the SPSS datafile you wish to import, use:
file.choose()
Copy and paste the filepath into this code:
dataset = read.spss(“[filepath including filename goes here]”, to.data.frame=TRUE)
The option at the end creates the R file as a dataframe, which is the type of data object I want in R.
Note: I am using dataset as my dataset name in this example. Use whatever name is best for you, and remember to change all instances of dataset to your actual dataset name in later code.
Unfortunately, if your SPSS datafile had variable labels (e.g. “Sex of respondent”), these aren’t shown in the R dataframe, only the variable names are shown (e.g. Sex). While the name is often clear for variables such as sex, you may find that the names are less clear for other options (e.g. for a survey containing multiple “select all that apply” type questions/responses). It is therefore very useful to have the list of variable names and their associated labels.
You can simply print the concordance to the console by using:
attr(dataset, “variable.labels”)
I didn’t find this helpful, for two reasons:
- I have a lot of variables, so it takes up a sizeable amount of console space
- I am going to keep referring to the labels when I need to do analyses, and retaining the information in the console is not helpful if I have to keep scrolling back, or reissuing the command
dataset.labels <- as.data.frame(attr(dataset, “variable.labels”))
Voila, 5 lines of code to get my SPSS data and variable labels into R.
To leave a comment for the author, please follow the link and comment on their blog: R in the Antipodes.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.