Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
You might find that loading data into R can be quite frustrating. Almost every single type of file that you want to get into R seems to require its own function, and even then you might get lost in the functions’ arguments. In short, it can be fairly easy to mix up things from time to time, whether you are a beginner or a more advanced R user…
To cover these needs, DataCamp decided to publish a comprehensive, yet easy tutorial to quickly importing data into R, going from simple text files to the more advanced SPSS and SAS files. Keep on reading to find out how to easily import your files into R!
Your Data
To import data into R, you first need to have data. This data can be saved in a file onto your computer (e.g. a local Excel, SPSS, or some other type of file), but can also live on the Internet or be obtained through other sources. Where to find these data are out of the scope of this tutorial, so for now it’s enough to mention this blog post, which explains well how to find data on the internet, and DataCamp’s interactive tutorial, which deals with how to import and manipulate Quandl data sets.
Tip: before you move on and discover how to load your data into R, it might be useful to go over the following checklist that will make it easier to import the data correctly into R:
- If you work with spreadsheets, the first row is usually reserved for the header, while the first column is used to identify the sampling unit;
- Avoid names, values or fields with blank spaces, otherwise each word will be interpreted as a separate variable, resulting in errors that are related to the number of elements per line in your data set;
- If you want to concatenate words, inserting a . in between to words instead of a space;
- Short names are prefered over longer names;
- Try to avoid using names that contain symbols such as
?
,$
,%
,^
,&
,*
,(
,)
,-
,#
,?
,,
,<
,>
,/
,|
, ,[
,]
,{
, and}
; - Delete any comments that you have made in your Excel file to avoid extra columns or NA’s to be added to your file; and
- Make sure that any missing values in your data set are indicated with
NA
.
Preparing Your R Workspace
Make sure to go into RStudio and see what needs to be done before you start your work there. You might have an environment that is still filled with data and values, which you can all delete using the following line of code:
rm(list=ls())
The rm()
function allows you to “remove objects from a specified environment”. In this case, you specify that you want to consider a list for this function, which is the outcome of the ls()
function. This last function returns you a vector of character strings that gives the names of the objects in the specified environment. Since this function has no argument, it is assumed that you mean the data sets and functions that you as a user have defined.
Next, you might also find it handy to know where your working directory is set at the moment:
getwd()
And you might consider changing the path that you get as a result of this function, maybe to the folder in which you have stored your data set:
setwd("<location of your dataset>")
Getting Data From Common Sources into R
You will see that the following basic R functions focus on getting spreadsheets into R, rather than Excel or other type of files. If you are more interested in the latter, scroll a bit further to discover the ways of importing other files into R.
Importing TXT files
If you have a .txt
or a tab-delimited text file, you can easily import it with the basic R function read.table()
. In other words, your file will look similar to this
// Contents of .txt 1 6 a 2 7 b 3 8 c 4 9 d 5 10 e
and can be imported as follows:
df <- read.table("<FileName>.txt", header = FALSE)
Note that by using this function, your data from the file will become a data.frame
object. Note also that the first argument isn’t always a filename, but could possibly also be a webpage that contains data. The header
argument specifies whether or not you have specified column names in your data file. The final result of your importing will show in the RStudio console as:
V1 V2 V3 1 1 6 a 2 2 7 b 3 3 8 c 4 4 9 d 5 5 10 e
Good to know
The read.table()
function is the most important and commonly used function to import simple data files into R. It is easy and flexible. That is why you should definitely check out our previous tutorial on reading and importing Excel files into R, which explains in great detail how to use the read.table()
function optimally.
For files that are not delimited by tabs, like .csv
and other delimited files, you actually use variants of this basic function. These variants are almost identical to the read.table()
function and differ from it in three aspects only:
- The separator symbol;
- The
header
argument is always set at TRUE, which indicates that the first line of the file being read contains the header with the variable names; - The
fill
argument is also set as TRUE, which means that if rows have unequal length, blank fields will be added implicitly.
Importing CSV Files
If you have a file that separates the values with a ,
or ;
, you usually are dealing with a .csv
file. It looks somewhat like this:
// Contents of .csv file Col1,Col2,Col3 1,2,3 4,5,6 7,8,9 a,b,c
In order to successfully load this file into R, you can use the read.table()
function in which you specify the separator character, or you can use the read.csv()
or read.csv2()
functions. The former function is used if the separator is a ,
, the latter if ;
is used to separate the values in your data file.
Remember that the read.csv()
as well as the read.csv2()
function are almost identical to the read.table()
function, with the sole difference that they have the header
and fill
arguments set as TRUE
by default.
df <- read.table("<FileName>.csv", header = FALSE, sep = ",") df <- read.csv("<FileName>.csv", header = FALSE) df <- read.csv2("<FileName>.csv", header= FALSE)
Tip: if you want to know more about the arguments that you can use in the read.table()
, read.csv()
or read.csv2()
functions, you can always check out our reading and importing Excel files into R tutorial, which explains in great detail how to use the read.table()
, read.csv()
or read.csv2()
functions.
Importing Files With Other Separator Characters
In case you have a file with a separator character that is different from a tab, a comma or a semicolon, you can always use the read.delim()
and read.delim2()
functions. These are variants of the read.table()
function, just like the read.csv()
function. Consequently, they have much in common with the read.table()
function, except for the fact that they assume that the first line that is being read in is a header with the attribute names, while they use a tab as a separator instead of a whitespace, comma or semicolon. They also have the fill
argument set to TRUE
, which means that blank field will be added to rows of unequal length.
You can use the read.delim()
and read.delim2()
functions as follows:
df <- read.delim("<name and extension of your file>") df <- read.delim2("<name and extension of your file>")
Importing Excel Files Into R
To load Excel files into R, you first need to do some further prepping of your workspace in the sense that you need to install packages. Simply run the following piece of code to accomplish this:
install.packages("<name of the package>")
When you have installed the package, you can just type in the following to activate it in your workspace:
library("<name of the package>")
To check if you already installed the package or not, type in
any(grepl("<name of your package>", installed.packages()))
Importing Excel Files With The XLConnect Package
The first way to get Excel files directly into R is by using the XLConnect
package. Install the package and if you’re not sure whether or not you already have it, check if it is already there.
Next, you can start using the readWorksheetFromFile()
function, just like shown here below:
library(XLConnect) df <- readWorksheetFromFile("<file name and extension>", sheet = 1)
Note that you need to add the sheet
argument to specify which sheet you want to load into R. You can also add more specifications. You can find these explained in our tutorial on reading and importing Excel files into R.
You can also load in a whole workbook with the loadWorkbook()
function, to then read in worksheets that you desire to appear as data frames in R through readWorksheet()
:
wb <- loadWorkbook("<name and extension of your file>") df <- readWorksheet(wb, sheet=1)
Note again that the sheet
argument is not the only argument that you can use in readWorkSheetFromFile()
. If you want more information about the package or about all the arguments that you can pass to the readWorkSheetFromFile()
function or to the two alternative functions that were mentioned, you can visit the package’s RDocumentation page.
Importing Excel Files With The Readxl Package
The readxl
package has only recently been published and allows R users to easily read in Excel files, just like this:
library(readxl) df <- read_excel("<name and extension of your file>")
Note that the first argument specifies the path to your .xls
or .xlsx
file, which you can set by using the getwd()
and setwd()
functions. You can also add a sheet
argument, just like with the XLConnect package, and many more arguments on which you can read up here or in this blog post.
Importing JavaScript Object Notation (JSON) Files Into R
To get JSON files into R, you first need to install or load the rjson package. If you want to know how to install packages or how to check if packages are already installed, scroll a bit up to the section of importing Excel files into R.
Once you have done this, you can use the fromJSON()
function. Here, you have two options:
Your JSON file is stored in your working directory.
library(rjson) JsonData <- fromJSON(file = "<filename.json>" )
Your JSON file is available through a URL.
library(rjson) JsonData <- fromJSON(file = "<URL to your JSON file>" )
Importing XML Data Into R
If you want to get XML data into R, one of the easiest ways is through the usage of the XML package. First, you make sure you install and load the XML package in your workspace, just like demonstrated above. Then, you can use the xmlTreeParse()
function to parse the XML file directly from the web:
library(XML) xmlfile <- xmlTreeParse("<Your URL to the XML data>")
Next, you can check whether R knows that xmlfile
is in XML by entering:
class(xmlfile) #Result is usually similar to this: [1] "XMLDocument" "XMLAbstractDocument"
Tip: you can use the xmlRoot()
function to access the top node:
topxml <- xmlRoot(xmlfile)
You will see that the data is presented kind of weirdly when you try printing out the xmlfile
vector. That is because the XML file is still a real XML document in R at this point. In order to put the data in a data frame, you first need to extract the XML values. You can use the xmlSApply()
function to do this:
topxml <- xmlSApply(topxml, function(x) xmlSApply(x, xmlValue))
The first argument of this function will be topxml
, since it is the top node on whose children you want to perform a certain function. Then, you list the function that you want to apply to each child node. In this case, you want to extract the contents of a leaf XML node. This, in combination with the first argument topxml
, will make sure that you will do this for each leaf XML node.
Lastly, you put the values in a dataframe! You use the data.frame()
function in combination with the matrix transpostition function t()
to do this. Additionally you also specify that no row names should be included:
xml_df <- data.frame(t(topxml), row.names=NULL)
You can also choose not to do all the previous steps, which are a bit more complicated, and to just do the following:
url <- "<a URL with XML data>" data_df <- xmlToDataFrame(url)
Importing Data From HTML Tables Into R
Getting data From HTML tables into R is pretty straightforward:
url <- "<a URL>" data_df <- readHTMLTable(url, which=3)
Note that the which
argument allows you to specify which tables to return from within the document.
If this gives you an error in the nature of “failed to load external entity”, don’t be confused: this error has been signaled by many people and has been confirmed by the package’s author here. You can work around this by using the RCurl
package in combination with the XML
package to read in your data:
library(XML) library(RCurl) url <- "YourURL" urldata <- getURL(url) data <- readHTMLTable(urldata, stringsAsFactors = FALSE)
Note that you don’t want the strings to be registered as factors or categorical variables! You can also use the httr package to accomplish exactly the same thing, except for the fact that you will want to convert the raw objects of the URL’s content to characters by using the rawToChar
argument:
library(httr) urldata <- GET(url) data <- readHTMLTable(rawToChar(urldata$content), stringsAsFactors = FALSE)
Getting Data From Statistical Software Packages into R
For the following more advanced statistical software programs, there are corresponding packages that you first need to install in order to read your data files into R, just like you do with Excel or JSON.
Importing SPSS Files into R
If you’re a user of SPSS software and you are looking to import your SPSS files into R, firstly install the foreign package. After loading the package, run the read.spss()
function that is contained within it and you should be good to go!
library(foreign) mySPSSData <- read.spss("example.sav")
Tip: if you wish the result to be displayed in a data frame, make sure to set the to.data.frame
argument of the read.spss()
function to TRUE
. Furthermore, if you do NOT want the variables with value labels to be converted into R factors with corresponding levels, you should set the use.value.labels
argument to FALSE
:
library(foreign) mySPSSData <- read.spss("example.sav", to.data.frame=TRUE, use.value.labels=FALSE)
Remember that factors are variables that can only contain a limited number of different values. As such, they are often called “categorical variables”. The different values of factors can be labeled and are therefore often called “value labels”
Importing Stata Files into R
To import Stata files, you keep on using the foreign
package:
library(foreign) mydata <- read.dta("<Path to file>")
Importing Systat Files into R
If you want to get Systat files into R, you also want to use the foreign package, just like shown below:
library(foreign) mydata <- read.systat("<Path to file>")
Importing SAS Files into R
For those R users that also want to import SAS file into R, it’s very simple! For starters, install the sas7bdat
package. Load it, and then invoke the read.sas7bdat()
function contained within the package and you are good to go!
library(sas7bdat) mySASData <- read.sas7bdat("example.sas7bdat")
Does this function interest you and do you want to know more? Visit the Rdocumentation page.
Importing Minitab Files into R
Is your software of choice for statistical purposes Minitab? Look no further if you want to use Minitab data in R!
Importing .mtp
files into R is pretty straightforward. To begin with, install the foreign package and load it. Then simply use the read.mtp()
function from that package:
library(foreign) myMTPData <- read.mtp("example2.mtp")
Importing RDA or RData Files into R
If your data file is one that you have saved in R as an .rdata
file, you can read it in as follows:
load("<FileName>.RDA")
Getting Data From Other Sources Into R
Since this tutorial focuses on importing data from different types of sources, it is only right to also mention that you can import data into R that comes from databases, webscraping, etc.
Importing Data From Databases
Importing Data From Relational Databases
For more information on getting data from relational databases into R, check out this tutorial for importing data from MonetDB.
If, however, you want to load data from MySQL into R, you can follow this tutorial, which uses the dplyr package to import the data into R.
If you are interested in knowing more about this last package, make sure to check out DataCamp’s interactive course, which is definitely a must for everyone that wants to use dplyr to access data stored outside of R in a database. Furthermore, the course also teaches you how to perform sophisticated data manipulation tasks using dplyr!
Importing Data From Non-Relational Databases
For more information on loading data from non-relational databases into R, like data from MongoDB, you can read this blogpost from “Yet Another Blog in Statistical Computing” for an overview on how to load data from MongoDB into R.
Importing Data Through Webscraping
You can read up on how to scrape JavaScript data with R with the use of PhantomJS and the rvest package in this DataCamp tutorial. If you want to use APIs to import your data, you can easily find one here.
Tip: you can check out this set of amazing tutorials which deal with the basics of webscraping.
Importing Data Through The TM Package
For those of you who are interested in importing textual data to start mining texts, you can read in the text file in the following way after having installed and activated the tm package:
text <- readLines("<filePath>")
Then, you have to make sure that you load these data as a corpus in order to get started correctly:
docs <- Corpus(VectorSource(text))
You can find an accessible tutorial on text mining with R here.
This Is Just The Beginning…
Loading your data into R is just a small step in your exciting data analysis, manipulation and visualization journey. DataCamp is here to guide you through it!
If you are a beginner, make sure to check out our tutorials on machine learning and histograms.
If you are already a more advanced R user, you might be interested in reading our tutorial on 15 Easy Solutions To Your Data Frame Problems In R.
Also, don’t forget to pass by DataCamp to see whether our offer of interactive courses on R can interest you!
The post This R Data Import Tutorial Is Everything You Need appeared first on The DataCamp Blog .
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.