How to Load SAS Files in R: Transitioning from SAS to R with Seamless Data Integration
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Using R as an alternative to SAS (Statistical Analysis System) offers bespoke interactivity on top of R routines. It enables effective technical handling while engaging non-technical users through interactive data storytelling.
Transitioning from SAS to R can be a challenge for many data analysts and programmers. But the solution is within reach. It can be easy if part of your SAS pipeline produces data that you use to create reports from R.
In this article, we will explore how to integrate SAS data into your R workflow, allowing you to harness the strengths of both tools. We will focus on reading and writing SAS data files in R and overcoming common challenges. By the end of this guide, you’ll be well-equipped to bridge the gap between SAS and R, making your data analysis journey smooth and efficient.
TL;DR:
- Transitioning from SAS to R offers enhanced technical functionality, making data interaction more intuitive.
- Explore how to smoothly transition and integrate data from SAS and R.
- Understand SAS File Types:
- Data Files (.sas7bdat) – hold tabular data similar to R dataframes
- Catalog Files (.sas7bcat) – contain dataset metadata
- Read SAS Data in R and Write SAS Data from R using the haven package with practical examples.
- Best Practices:
- Prioritize reproducibility
- Use targets pipeline for routine tasks
- Seek guidance (from R/SAS communities and platforms like Stack Overflow)
- The haven package simplifies SAS and R data interoperability.
Table of Contents
Understanding Different SAS File Types
SAS has many types of file objects. We will explore how to use R to both read and write the following types of SAS objects:
-
- Data Files (.sas7bdat): These files store tabular data, including numeric, character, and date variables. SAS data files are the most common type and similar to R data frames.
- Catalog Files (.sas7bcat): Catalog files contain metadata about datasets, including variable formats, labels, and other attributes.
How-To: Reading SAS data
To read SAS files in R, we can use the {haven} package, created and maintained by the tidyverse ecosystem. It provides functions to read SAS datasets.
Here’s a step-by-step guide to reading SAS files in R:
#install.packages(“haven”) library(haven) sas_data <- read_sas("file.sas7bdat")
You can use this approach both for .sas7bdat and .sas7bcat extension files.
Encoding Issues
SAS datasets might not use standard encodings. To handle these issues, specify the encoding when using read_sas()
:
read_sas("file.sas7bdat", encoding = "UTF-8")
Dealing with SAS Labels
In R, we handle value labels by using factors. However, SAS does it in a different way (semantics from SAS). Haven provides the labeled S3 classes to allow importing labeled vectors into R.
From the documentation vignette, it showcases an example on how it can deal with labelled SAS object files.
x1 <- labelled( sample(1:5), c(Good = 1, Bad = 5) ) x2 <- labelled( c("M", "F", "F", "F", "M"), c(Male = "M", Female = "F") ) tibble::data_frame(x1, x2, z = 1:5)
How-To: Writing SAS data from R
To write SAS data from R, you can also use the haven package:
my_data <- data.frame( ID = 1:5, Name = c("Bob", "Ed", "Rod", "Dav", "Eva"), Value = c(90, 85, 78, 92, 88) ) write_xpt( my_data, path = "output_file.sas7bdat" )
Missing values
Newer version of haven already deals with missing values in the same format as “NA” from R.
You can also specify a missing value manually if required by using tagged_na()
.
my_data <- data.frame( ID = 1:5, Name = c("Bob", "Ed", "Rod", "Dav", "Eva"), Value = c(90, 85, 78, 92, 88), na_values = tagged_na("Not applicable") ) write_xpt( my_data, path = “output_file.sas7bdat" ) read_sas("output_file.sas7bdat”)
Example
Let’s dive in a simple example using SAS datasets. For this scenario, we’ll download the CARS dataset from SASHELP library.
Set library to sas data file
Since SASHELP is a library dataset, it’s not in a SAS data file (sas7bdat). This means that we must save it to the proper format before downloading the data file. To do this, just run the SAS program:
%Let username = your_username; Libname out "/home/&username/sasuser.v94/"; Data out.cars_data; set sashelp.cars; run;
Remember to update your folder username variable without parentheses.
Now you can download the data file and use it in R.
Playing with data – Summary statistics
To illustrate the example in R, let’s calculate a summary statistic of all columns by the column Type. In SAS, this can be done with the utility helper “Summary statistics”.
You can print the result as a pdf file
It also returns the code:
ods noproctitle; ods graphics / imagemap=on; proc means data=SASHELP.CARS chartype mean std min max median n nmiss vardef=df qrange qmethod=os; var MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length; class Type; run;
In R, a similar approach that results in a pdf file can be done with the package summarytools.
library(haven) library(dplyr) library(summarytools) data <- read_sas("cars_data.sas7bdat") grouped_data <- data %>% group_by(Type) view(dfSummary(grouped_data))
If you desire to have the result in a dataframe, just update the code:
summarised_data <- data %>% group_by(Type) %>% summarise( across( where(is.numeric), list( mean = mean, stdev = sd, median = median, min = min, max = max, iqr = ~IQR(..1, na.rm = TRUE) ) ) )
Now all we have to do is save the dataframe to SAS data files.
write_xpt(summarised_data, "cars_summarised.sas7bdat")
You can upload the file back to SAS.
Best Practices
- Reproducibility: You can document the steps you take when reading and writing SAS data with R Markdown or Quarto. This is an important aspect of reproducibility.
If you desire to run the workflow on a routine, then you can consider using targets pipeline.
- Seek Help: If you require further guidance, don’t hesitate to seek help from the R community or SAS communities. Collaboration can often lead to quicker solutions.
Also, Stack Overflow is a great resource, and it’s quite possible that someone has already faced and shared a solution similar to yours.
Conclusion
Using the haven package to read and write SAS data has eased out much of the struggles in SAS and R interoperability. This guide showcases how to read SAS files and deal with common issues related to that process.
Do you want to get more out of your data with custom analytics and solutions? We’re here to help.
The post appeared first on appsilon.com/blog/.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.