Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
An xlsx is a file format used for Microsoft Excel spreadsheets. Excel can be used to store tabular data.
R has built-in functionality that makes it easy to read and write an xlsx file.
Sample xlsx File
To demonstrate how we read xlsx files in R, let's suppose we have an excel file named studentinfo.xlsx
with the following data:
We will be reading these data with the help of R's built-in functions.
Install and Load xlsx Package
In order to read, write, and format Excel files into R, we first need to install and load the xlsx
package as:
# install xlsx package install.package("xlsx") # load xlsx file library("xlsx")
Here, we have successfully installed and loaded the xlsx package.
Now, we are able to read data from an xlsx file.
Read an xlsx File in R
In R, we use the read.xlsx()
function to read an xlsx file available in our current directory. For example,
# install xlsx package install.package("xlsx") # load xlsx file library("xlsx") # read studentinfo.xlsx file from our current directory read_data <- read.xlsx("studentinfo.xlsx", sheetIndex = 1) # display xlsx file print(read_data)
Output
Name Age Faculty State 1 Abby 24 Business Florida 2 Hazzle 23 Engineering Arizona 3 Cathy 20 Engineering Colorado 4 Paterson 22 Arts Texas 5 Sammy 20 Economics Ohio 6 Pam 21 Arts Arizona
In the above example, we have read the studentinfo.xlsx
file that is available in our current directory. Notice the code,
read_data <- read.xlsx("studentinfo.xlsx", sheetIndex = 1)
Here,
read.xlsx()
– reads the xlsx filestudentinfo.xlsx
and creates a dataframe that is stored in the read_data variable.sheetIndex = 1
– reads specified worksheet i.e. 1
Note:
- If the file is in some other location, we have to specify the path along with the file name as:
read.xlsx("D:/folder1/studentinfo.xlsx", sheetIndex = 1)
. - We can also use the
read.xlsx2()
function if the dataset we are working on is larger.
xlsx rowIndex and colIndex Argument in R
In R, we can also read a specific range of data from excel files. We can pass the rowIndex
and colIndex
argument inside read.xlsx()
to read specific range.
rowIndex
– reads a specific range of rowscolIndex
– read a specific range of columns
Example: Read Range of Rows
# install xlsx package install.package("xlsx") # load xlsx file library("xlsx") # read first five rows of xlsx file read_data <- read.xlsx("studentinfo.xlsx", sheetIndex = 1, rowIndex = 1:5 ) # display xlsx file print(read_data)
Output
Name Age Faculty State 1 Abby 24 Business Florida 2 Hazzle 23 Engineering Arizona 3 Cathy 20 Engineering Colorado 4 Paterson 22 Arts Texas
In the above example, we have passed rowIndex = 1:5
inside read.xlsx()
so the function reads only the first five rows from the studentinfo.xlsx
file.
Example: Read Range of Columns
# install xlsx package install.package("xlsx") # load xlsx file library("xlsx") # read first three columns of xlsx file read_data <- read.xlsx("studentinfo.xlsx", sheetIndex = 1, colIndex = 1:3 ) # display xlsx file print(read_data)
Output
Name Age Faculty 1 Abby 24 Business 2 Hazzle 23 Engineering 3 Cathy 20 Engineering 4 Paterson 22 Arts 5 Sammy 20 Economics 6 Pam 21 Arts
Here, colIndex = 1:3
inside read.xlsx()
reads only the first three columns from the studentinfo.xlsx
file.
xlsx startRow Argument in R
Sometimes the excel file may contain headers at the beginning that we may not want to include. For example,
Here, the 1st Row of the excel file contains a header, and the 2nd row is empty. So we don't want to include these two rows.
To start reading data from a specific row in the excel worksheet, we pass the startRow
argument inside read.xlsx()
.
Let's take a look at an example,
# install xlsx package install.package("xlsx") # load xlsx file library("xlsx") # start reading from 3rd row read_data <- read.xlsx("studentinfo.xlsx", sheetIndex = 1, startRow = 3 ) # display xlsx file print(read_data)
Output
Name Age Faculty State 1 Abby 24 Business Florida 2 Hazzle 23 Engineering Arizona 3 Cathy 20 Engineering Colorado 4 Paterson 22 Arts Texas 5 Sammy 20 Economics Ohio 6 Pam 21 Arts Arizona
In the above example, we have used the startRow
argument inside the read.xlsx()
function to start reading from the specified row.
startRow = 3
means the first two rows are ignored and read.xlsx()
starts reading data from the 3rd row.
Write Into xlsx File in R
In R, we use the write.xlsx()
function to write into an xlsx file. We pass the data in the form of dataframe. For example,
# install xlsx package install.package("xlsx") # load xlsx file library("xlsx") # create a data frame dataframe1 <- data.frame ( Name = c("Juan", "Alcaraz", "Simantha"), Age = c(22, 15, 19), Vote = c(TRUE, FALSE, TRUE)) # write dataframe1 into file1 xlsx file write.xlsx(dataframe1, "file1.xlsx")
In the above example, we have used the write.xlsx()
function to export a data frame named dataframe1 to a xlsx file. Notice the arguments passed inside write.xlsx()
,
write.xlsx(dataframe1, "file1.xlsx")
Here,
dataframe1
– name of the data frame we want to exportfile1.xlsx
– name of the xlsx file
Finally, the file1.xlsx
file would look like this in our directory:
Rename Current Worksheet
We can rename the current worksheet by using the sheetName
argument inside the write.xlsx()
function. For example,
# install xlsx package install.package("xlsx") # load xlsx file library("xlsx") # create a data frame dataframe1 <- data.frame ( Name = c("Juan", "Alcaraz", "Simantha"), Age = c(22, 15, 19), Vote = c(TRUE, FALSE, TRUE)) # name current worksheet write.xlsx(dataframe1, "file1.xlsx", sheetName = "Voting Eligibility" )
Here, we have passed sheetname = "Voting Eligibility"
inside write.xlsx()
, so the name of the sheet is changed to "Voting Eligibility"
.
So the file1.xlsx
looks like this:
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.