Site icon R-bloggers

Reading Data From Excel Files (xls,xlsx,csv) into R-Quick Guide

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Reading Data From Excel Files into R, so many people still saving their dataset in R but sometimes coming to data analysis facing lots of difficulties, while loading data set into R, we can make use of the power of R functions.

In this tutorial we are going to describe how to read excel data xls or xlsx file formats into R. This can be done based on using readxl, xlsx, openxlsx, or XLConnect package.

Reading Data From Excel Files into R

1. readxl package

If you are not installed readxl package then you can use below code

Repeated Measures of ANOVA in R Complete Tutorial »

install.packages("readxl")

Load readxl package into R.

library("readxl")

Reading xls and xlsx format is given below.

For xls files

data<- read_excel("file.xls")

For xlsx files

data <- read_excel("file.xlsx")

You can choose a file interactively based on file.choose() function. This is time consuming so not recommended.

data <- read_excel(file.choose())

Imagine if you have multiple sheets then you can make use of argument sheet.

You need to specify sheet by its name

data <- read_excel("my_file.xlsx", sheet = "sheetname")

You can specify sheet by its index

data <- read_excel("my_file.xlsx", sheet = 2)

Sometimes in excel sheet contains the missing values, if you are reading the file in R it will display as a blank cell, You can avoid these kinds of issues while setting na argument.

QQ-plots in R: Quantile-Quantile Plots-Quick Start Guide »

data <- read_excel("file.xlsx", na = "---")

If you want to read multiple excel files then,

library(readxl)
file.list <- list.files(pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)

If you also want to include the files in subdirectories, then

file.list <- list.files(pattern='*.xlsx', recursive = TRUE)

Suppose all the sheets have same column name then you can make use of bind_rows,

library(dplyr)
df <- bind_rows(df.list, .id = "id")

2. xlsx Package

One of the another package is xlsx,  java-based solution, for reading, writing and formatting excel files in R.

If you are not installed you can install the package based on below code.

install.packages("xlsx")

Let’s load the xlsx package in R.

library("xlsx")

How to use xlsx package?

KNN Algorithm Machine Learning » Classification & Regression »

In xlsx pakage mainly two functions read.xlsx() and read.xlsx2()

Suppose if you have bigger files then read.xlsx2() function recommended because it’s load faster than read.xlsx.

Xlsx package format is given below.

read.xlsx(file, sheetIndex, header=TRUE)
read.xlsx2(file, sheetIndex, header=TRUE)

file indicating the file path

sheetIndex indicate the index of the sheet to be read

header indicates a logical value. If header is TRUE then the first row is considered as column names.

library("xlsx")
data <- read.xlsx(file.choose(), 1)  # read first sheet
data <- read.xlsx(“file.xlsx”, 1)  # read first sheet
data <- read.xlsx(“file.xlsx”, sheetName=”Sheet1”)  # read the data contains in Sheet1

Another way of importing data is copying from Excel and import into R

If you are using windows system the,

eXtreme Gradient Boosting in R » Ultimate Guide »

data <- read.table(file = "clipboard", sep = "\t", header=TRUE)

MAC OSX system

data <- read.table(pipe("pbpaste"), sep="\t", header = TRUE)

this is not the better way of importing data into R

3. openxlsx Package

openxlsx package is an another alternative to readxl package

library(openxlsx)
read.xlsx(file_path)

or

read.xlsx(file_path, cols = 1:2, rows = 2:3)

4. XLConnect package

XLConnect is an alternative to the xlsx package

install.packages("XLConnect")
library(XLConnect)
data <- readWorksheetFromFile(file_path, sheet = "list-column",
                              startRow = 1, endRow = 10,
                              startCol = 1, endCol = 3)

If you want to read several sheets then

Reading several sheets

load <- loadWorkbook(file_path)
data <- readWorksheet(load, sheet = "list-column",
                      startRow = 1, endRow = 10,
                      startCol = 1, endCol = 3)
data2 <- readWorksheet(load, sheet = "two-row-header",
                       startRow = 1, endRow = 10,
                       startCol = 1, endCol = 4)

In this package yu can Import a named region once

data <- readNamedRegionFromFile(file, # File path
                                name, # Region name
                                ...)  # Arguments of readNamedRegion()

Reading several named regions

Naive Bayes Classification in R » Prediction Model »

load <- loadWorkbook(file_path)
data <- readNamedRegion(load, name_Region_1, ...)
data2 <- readNamedRegion(load, name_Region_2, ...)

If you have csv file then

data<-read.csv(“file.csv”,1)

Sometimes reading excel files JAVA errors can occur, you can avoid those issues while seting the java path in R

Prints the path of JAVA Home in R

Sys.getenv("JAVA_HOME")

Sets the path of JAVA

Sys.setenv(JAVA_HOME = "path_to_jre_java_folder")

jre folder contains inside the Java folder of your computer (Program Files)

Enjoyed this tutorial? Don’t forget to show your love, Please Subscribe the Newsletter and COMMENT below!

R Plot pch Symbols: Different point shapes in R »

The post Reading Data From Excel Files (xls,xlsx,csv) into R-Quick Guide appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Methods – finnstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.