RObservations #33: Merging Excel Spreadsheets with Base R and openxlsx
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
I was recently asked as part of a larger task to combine multiple sheets from an excel workbook into a into a single sheet. When approached about the problem I immediately was asked if I was going to use VBA to do it. While I know my way around VBA, since VBA does not have a native way to undo its operations I was uncomfortable with the potential hazard using VBA would yield if a mistake was made or something wrong happens.
In this blog I share how its possible to combine and format sheets using the openxlsx
package and base R. Since I’m limiting myself to one library and base R, I will be employing base R’s pipe operator – |>
, instead of the superior magrittr
pipe – %>%
(my opinion only, don’t take it too seriously).
The Problem
Since I can’t share any of the data I was working with, for this blog I created some data using generatedata.com which randomly created name, phone and email fields. After copy-pasting the data into three sheets in an excel workbook which I called appropriately Three Sheets.xlsx
, a working example is made!
The aim of the solution is to:
- Read each sheet into R.
- Combine them, and
- Create a new Excel spreadsheet with the same formatting.
Its possible to make more sophisticated formatting with openxlsx
, but for simplicity, the formatting is very simple with the data having borders, a bold header and the text fitting the width of the cells they are in.
For following along I broke down the code into pieces with some commentary. If you want to copy the whole script, the last code chunk at the end of the blog has it all for you to copy.
Reading Each Sheet
To read each sheet into R we first need to load the workbook. After loading the workbook we can loop through reading each sheet by using the sheet names to get a dataframe of each sheet. The code for doing this step is:
library(openxlsx) # Read workbook wb<- loadWorkbook("Three Sheets.xlsx") # Read each sheet as a list wb$sheet_names |> lapply(function(x) read.xlsx(wb,sheet=x)) ## [[1]] ## name phone email ## 1 Kalia Navarro 1-273-524-8639 [email protected] ## 2 Sawyer Perkins (649) 129-6971 [email protected] ## 3 Cain Gutierrez 1-711-860-5064 [email protected] ## 4 Wayne Whitney (718) 364-2857 [email protected] ## 5 Abbot Walls 1-154-859-6431 [email protected] ## ## [[2]] ## name phone email ## 1 Skyler Barrett 1-750-786-4095 [email protected] ## 2 Ray Russell (514) 272-6578 [email protected] ## 3 Thomas Ayala (889) 800-4141 [email protected] ## 4 Alexis Rhodes 1-615-560-3939 [email protected] ## 5 Curran Ayers 1-917-874-1818 [email protected] ## ## [[3]] ## name phone email ## 1 Margaret Lowery 1-216-330-6638 [email protected] ## 2 Brett Eaton 1-770-528-7545 [email protected] ## 3 Ray Dunn 1-600-386-5880 [email protected] ## 4 Jacqueline Davenport 1-668-100-3075 [email protected] ## 5 Jayme Kelley 1-267-951-0010 [email protected]
Combining the read data
The next step would be combining the data together. If the columns of each sheet are not the same, you will need to look into using Reduce()
together with the merge()
(see here). If the columns of each sheet are the same (as it is in this case), you can use do.call()
and set what = rbind
.
The code I used to combine the list of dataframes into single dataframe is:
# Read each sheet as a list wb$sheet_names |> lapply(function(x) read.xlsx(wb,sheet=x)) |> # combine the sheets into a single dataframe do.call(what=rbind) ## name phone email ## 1 Kalia Navarro 1-273-524-8639 [email protected] ## 2 Sawyer Perkins (649) 129-6971 [email protected] ## 3 Cain Gutierrez 1-711-860-5064 [email protected] ## 4 Wayne Whitney (718) 364-2857 [email protected] ## 5 Abbot Walls 1-154-859-6431 [email protected] ## 6 Skyler Barrett 1-750-786-4095 [email protected] ## 7 Ray Russell (514) 272-6578 [email protected] ## 8 Thomas Ayala (889) 800-4141 [email protected] ## 9 Alexis Rhodes 1-615-560-3939 [email protected] ## 10 Curran Ayers 1-917-874-1818 [email protected] ## 11 Margaret Lowery 1-216-330-6638 [email protected] ## 12 Brett Eaton 1-770-528-7545 [email protected] ## 13 Ray Dunn 1-600-386-5880 [email protected] ## 14 Jacqueline Davenport 1-668-100-3075 [email protected] ## 15 Jayme Kelley 1-267-951-0010 [email protected]
Now that the data is combined, it can now be written as an excel file and formatted accordingly.
Writing the data to an Excel file and formatting the data.
openxlsx
‘s write.xlsx
function is really powerful with the list of options available for how you can format the data. The documentation is really well written and very easy to understand. The package makes a separate argument for the header row, so its possible to make the header dynamically different from the rest of the data. The use case I have for this blog is very simple, but you can have very intricate formatting as well.
The code I used for this is:
# Read each sheet as a list wb$sheet_names |> lapply(function(x) read.xlsx(wb,sheet=x)) |> # combine the sheets into a single dataframe do.call(what=rbind)|> # Write into excel file and match formatting write.xlsx("combinedSheets.xlsx", colWidths="auto", borders = "all", headerStyle= createStyle(textDecoration = "Bold", border=c("top", "bottom", "left", "right")) )
The final product looks like this:
To do this all in a single script, the code is:
library(openxlsx) # Read workbook wb<- loadWorkbook("Three Sheets.xlsx") # Read each individual sheet, combine them and format. # Read each sheet as a list wb$sheet_names |> lapply(function(x) read.xlsx(wb,sheet=x)) |> # combine the sheets into a single dataframe do.call(what=rbind)|> # Write into excel file and match formatting write.xlsx("combinedSheets.xlsx", colWidths="auto", borders = "all", headerStyle= createStyle(textDecoration = "Bold", border=c("top", "bottom", "left", "right")) )
Conclusion
The openxlsx
library has opened alot of opportunities for dealing with data cleaning and formatting data in Excel files without having to leave R. I used to clean data and export it as a .csv file and do the appropriate formatting in Excel. With openxlsx
everything can be streamlined directly in R! I will definitely be having this library on hand for work in the future and I think anyone who works with Excel files and R should also!
Thank you for reading!
Be sure to subscribe if you haven’t already!
Want to see more of my content?
Be sure to subscribe and never miss an update!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.