Site icon R-bloggers

RObservations #33: Merging Excel Spreadsheets with Base R and openxlsx

[This article was first published on r – bensstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

I was recently asked as part of a larger task to combine multiple sheets from an excel workbook into a into a single sheet. When approached about the problem I immediately was asked if I was going to use VBA to do it. While I know my way around VBA, since VBA does not have a native way to undo its operations I was uncomfortable with the potential hazard using VBA would yield if a mistake was made or something wrong happens.

In this blog I share how its possible to combine and format sheets using the openxlsx package and base R. Since I’m limiting myself to one library and base R, I will be employing base R’s pipe operator – |>, instead of the superior magrittr pipe – %>% (my opinion only, don’t take it too seriously).

The Problem

Since I can’t share any of the data I was working with, for this blog I created some data using generatedata.com which randomly created name, phone and email fields. After copy-pasting the data into three sheets in an excel workbook which I called appropriately Three Sheets.xlsx, a working example is made!

The aim of the solution is to:

  1. Read each sheet into R.
  2. Combine them, and
  3. Create a new Excel spreadsheet with the same formatting.

Its possible to make more sophisticated formatting with openxlsx, but for simplicity, the formatting is very simple with the data having borders, a bold header and the text fitting the width of the cells they are in.

For following along I broke down the code into pieces with some commentary. If you want to copy the whole script, the last code chunk at the end of the blog has it all for you to copy.

Reading Each Sheet

To read each sheet into R we first need to load the workbook. After loading the workbook we can loop through reading each sheet by using the sheet names to get a dataframe of each sheet. The code for doing this step is:

library(openxlsx)
# Read workbook
wb<- loadWorkbook("Three Sheets.xlsx") 
# Read each sheet as a list
wb$sheet_names |>
  lapply(function(x) read.xlsx(wb,sheet=x))


## [[1]]
##             name          phone                      email
## 1  Kalia Navarro 1-273-524-8639            sed@hotmail.net
## 2 Sawyer Perkins (649) 129-6971 eu.elit.nulla@outlook.couk
## 3 Cain Gutierrez 1-711-860-5064     turpis.nec@google.couk
## 4  Wayne Whitney (718) 364-2857              fusce@aol.edu
## 5    Abbot Walls 1-154-859-6431          convallis@aol.org
## 
## [[2]]
##             name          phone                               email
## 1 Skyler Barrett 1-750-786-4095 lacus.quisque.imperdiet@google.couk
## 2    Ray Russell (514) 272-6578        phasellus.ornare@outlook.edu
## 3   Thomas Ayala (889) 800-4141               phasellus@outlook.net
## 4  Alexis Rhodes 1-615-560-3939            luctus.et@protonmail.net
## 5   Curran Ayers 1-917-874-1818                     erat@google.org
## 
## [[3]]
##                   name          phone                         email
## 1      Margaret Lowery 1-216-330-6638 scelerisque.neque@hotmail.edu
## 2          Brett Eaton 1-770-528-7545             magna@outlook.org
## 3             Ray Dunn 1-600-386-5880           a.purus@outlook.net
## 4 Jacqueline Davenport 1-668-100-3075   non.bibendum.sed@google.com
## 5         Jayme Kelley 1-267-951-0010                 eu@yahoo.couk

Combining the read data

The next step would be combining the data together. If the columns of each sheet are not the same, you will need to look into using Reduce() together with the merge() (see here). If the columns of each sheet are the same (as it is in this case), you can use do.call() and set what = rbind.

The code I used to combine the list of dataframes into single dataframe is:

# Read each sheet as a list
wb$sheet_names |>
  lapply(function(x) read.xlsx(wb,sheet=x)) |>
  # combine the sheets into a single dataframe
  do.call(what=rbind)


##                    name          phone                               email
## 1         Kalia Navarro 1-273-524-8639                     sed@hotmail.net
## 2        Sawyer Perkins (649) 129-6971          eu.elit.nulla@outlook.couk
## 3        Cain Gutierrez 1-711-860-5064              turpis.nec@google.couk
## 4         Wayne Whitney (718) 364-2857                       fusce@aol.edu
## 5           Abbot Walls 1-154-859-6431                   convallis@aol.org
## 6        Skyler Barrett 1-750-786-4095 lacus.quisque.imperdiet@google.couk
## 7           Ray Russell (514) 272-6578        phasellus.ornare@outlook.edu
## 8          Thomas Ayala (889) 800-4141               phasellus@outlook.net
## 9         Alexis Rhodes 1-615-560-3939            luctus.et@protonmail.net
## 10         Curran Ayers 1-917-874-1818                     erat@google.org
## 11      Margaret Lowery 1-216-330-6638       scelerisque.neque@hotmail.edu
## 12          Brett Eaton 1-770-528-7545                   magna@outlook.org
## 13             Ray Dunn 1-600-386-5880                 a.purus@outlook.net
## 14 Jacqueline Davenport 1-668-100-3075         non.bibendum.sed@google.com
## 15         Jayme Kelley 1-267-951-0010                       eu@yahoo.couk

Now that the data is combined, it can now be written as an excel file and formatted accordingly.

Writing the data to an Excel file and formatting the data.

openxlsx‘s write.xlsx function is really powerful with the list of options available for how you can format the data. The documentation is really well written and very easy to understand. The package makes a separate argument for the header row, so its possible to make the header dynamically different from the rest of the data. The use case I have for this blog is very simple, but you can have very intricate formatting as well.

The code I used for this is:

# Read each sheet as a list
wb$sheet_names |>
  lapply(function(x) read.xlsx(wb,sheet=x)) |>
  # combine the sheets into a single dataframe
  do.call(what=rbind)|>
  # Write into excel file and match formatting
  write.xlsx("combinedSheets.xlsx",
             colWidths="auto",
             borders = "all",
             headerStyle= createStyle(textDecoration = "Bold",
                                      border=c("top", 
                                               "bottom", 
                                               "left", 
                                               "right"))
             )

The final product looks like this:

To do this all in a single script, the code is:

library(openxlsx)
# Read workbook
wb<- loadWorkbook("Three Sheets.xlsx") 

# Read each individual sheet, combine them and format.  

# Read each sheet as a list
wb$sheet_names |>
  lapply(function(x) read.xlsx(wb,sheet=x)) |>
  # combine the sheets into a single dataframe
  do.call(what=rbind)|>
  # Write into excel file and match formatting
  write.xlsx("combinedSheets.xlsx",
             colWidths="auto",
             borders = "all",
             headerStyle= createStyle(textDecoration = "Bold",
                                      border=c("top", 
                                               "bottom", 
                                               "left", 
                                               "right"))
             )

Conclusion

The openxlsx library has opened alot of opportunities for dealing with data cleaning and formatting data in Excel files without having to leave R. I used to clean data and export it as a .csv file and do the appropriate formatting in Excel. With openxlsx everything can be streamlined directly in R! I will definitely be having this library on hand for work in the future and I think anyone who works with Excel files and R should also!

Thank you for reading!

Be sure to subscribe if you haven’t already!

Want to see more of my content?

Be sure to subscribe and never miss an update!

To leave a comment for the author, please follow the link and comment on their blog: r – bensstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.