Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Depending on your age, the song title ‘Word Up’ will either remind you of a bloke in a red codpiece, a Scottish rock band with a left field cover version, or possibly, Little Mix, (according to Google).
However, I urge you to put down your codpieces (yes, even you at the back) and instead gather round for tales of woe working with Microsoft Word.
Here’s the background: I’m currently working on a project where I am using parameterised rmarkdown for the first time – hooray! This will end up as 13 separate documents of about 30 pages, with lots of tables and charts. This is the first phase of many similar documents.
But, I have to render my final output to Word, which, I’m not going to lie, was a bit of a worry.
That’s a lot of Word, and a lot of scope for things to go wrong.
While I’m not adverse to Excel, I am far from a fan of Word.
Much of this is because of its infuriating layout quirks when it comes to margins,figures and tables.
The {officer}, {officedown} and {flextable} packages make working with Word easier.
You can use {officer} to create Word and Powerpoint documents when working in R, {officedown} to do the same in R Markdown, and {flextable} for, well, tables.
The advantage of {officedown} is it allows dynamic figure and table captions, cross referencing, and whole host of other features ( like multi column layouts, changing page orientation, and different formats for different sections)
{officer} is a lower level package, where you work with adding paragraphs, and other blocks. One of my favourite features is block_pour_docx
which allows you to insert an entire page, or even an entire document, into your rmarkdown file, as a one-liner.
{flextable} is not just for Word, and ranks alongside {gt} as a general purpose solution for all your table needs. If you haven’t considered it before now, it is definitely worth spending some time on.
There’s not a lot of documentation / help/ blog posts out there for working with these packages, especially for Word.
Alison Hill wrote a post detailing her friction log when working with Powerpoint, so consider this my friction log of working with Word.
(These are in no particular order)
Applying your departmental / organisational styles
You will see advice that says you need to knit a regular Word document ( from the New File menu item in RStudio), then open it (it doesn’t need any content) and then amend the styles in this document, before passing it to officer as a reference document.
My colleagues already had a .dot template with our departmental styles applied.
I found that I could use that (after I’d saved a copy as a .docx file) and the styles were picked up by {officedown}. No need to generate a blank file and amend everything from scratch (there are a lot of style elements and it would have been really tedious). If you already have a document with styles in place, you should be good to go. Here’s how my Rmd file looked. You specify rdocx_document
as the output format, and I saved my template file in the root directory so it was easy to reference :
output: officedown::rdocx_document: reference_docx: "my_report_template.docx"
Adding a row to a table to create space / improve presentation
Throughout this project I’m recreating an existing suite of tables and plots that have, until now, been produced in Excel & Word. The first table had totals at Scotland, Council and Health Board level, then a space, then a range of lower level data.
Once I’d got the data into shape, I first added the space using dplyr::add_row()
.
But, over time, this felt wrong – I didn’t like the idea of adding an unnecessary empty row to a dataframe, especially if I might need it for further analysis later on.
The solution, instead, was to add padding
to the relevant row(s)
flextable::flextable(df) %>% flextable::padding(., i = c(1,9), padding = 10, padding.top = FALSE)
Here I’m adding padding to rows 1 and 9 of my table ( flextable uses the i
and j
conventions for applying functions and modifications to rows and columns), and ensuring the padding goes at the bottom by setting padding.top
to FALSE
.
Setting row heights
Another way to improve presentation in some of my tables was to reduce row heights for some rows, and increase it for others.
out_table %>% flextable::height(., i = c(4), height = .015, part = "body") %>% flextable::height(., i = c(1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13), height = .25, part = "body")
Row 4 gets less space, while the other row heights are increased. N.B – I thought I could do something like
flextable::height(., i = c(1:3, 5:10, 12:13), height = .25, part = "body")
but it didn’t seem to like it, so each row got entered individually instead.
Amending number formats for specific columns
Here I amend the 3rd, 4th and 5th columns – because the 3rd column name is dynamic, I didn’t want to refer to it by name
flextable::flextable(table_data, cwidth = 1.05) %>% flextable::colformat_double(.,j = c(3, 4, 5), digits = 1) %>%
For other tables, where I know the column names are static, I can enter the names directly:
flextable::flextable(table_data, cwidth = 1.05) %>% flextable::colformat_double(.,j = c("Rate","Lower bound","Upper bound"), digits = 1)
Adding a partial header row above existing column names
I needed to add header row above the final two columns in my seven column table, denoting whether there was a statistically significant difference between them and a reference value.
flextable::add_header_row(.,values = c("","Significance"), colwidths = c(5,2))
The first 5 columns get passed nothing, while “Significance” spans the final two columns.
Highlighting a row
Using bg
, we can add a custom background colour to highlight a specific row.
I found that when creating the table in a function, I had to to use the which
syntax below:
flextable::bg(out_table, i = which(out_table$body$dataset$areaname == mylocation), bg = "wheat", part = "body")
Prior to this, I was using the syntax below, but it would not work inside a function.
i = ~ areaname == eval(mylocation),
Remove / blank out a column heading
Easy enough, give the column name you want to amend and pass it ""
out_table %>% flextable::set_header_labels(., areaname = "")
superscript in the chunks
Ok, technically not an {officer} / {officedown} thing, but here it is anyway. You’ll see I’m specifying the figure captions here, also
{r fig6, fig.height = 4, fig.width = 6.5, fig.cap= "Annual birth rate per 1,000 population^1^, 2002 - 2020"} {r fig8, fig.height = 3.5, fig.width = 6.5, fig.cap= "Age-sex standardised all age death rate per 100,000 population^1^, 2002 - 2020^2^,^3^", fig.cap.style = "Image Caption"}
ensuring a table returned even if no data exists
If this sounds crazy / stupid, then here’s the scenario. This particular table looks at deprivation. Not all areas have datazones within a certain deprivation decile, hence, there will not always be data to show. However, I want the table numbers to be consistent across all the documents – table 10 in doc1 should address the same topic as table 10 in doc 2 (instead of showing table 11 data under a table10 heading).
I went belt and braces here, using tibble::tribble()
and tibble::add_row()
if (my_nrows < 1) { out_data <- tibble::tribble(~`Data Zone`, ~Name, ~`Rank in Scotland (1 = most deprived area 6976 = least deprived`, ~ `National decile of deprivation`, NA_character_,NA_character_,NA_character_,NA_character_) outdata <- out_data %>% tibble::add_row(., tibble::tibble_row()) }
This returns an empty table for the edge case where currently no zones are in the specified decile.
Table Of Contents not showing when docs created in loop using purrr – getting asked to update external links
Here’s a strange one. When knitting docs individually, the resulting Word file opened and the table of contents was displayed. But, when rendering all of the docs in a loop using purrr, there was no TOC, and instead, I was askedif I’d like to update external links within the document. If I said ‘no’ , then the TOC did not update. If I said ‘yes’, then it would update, and appear as intended. I’m not sure why I only saw this when rendering with rmarkdown::render()
, but there doesn’t appear to be a way round it. This means, when the final documents are distributed, we will have to advise readers to accept any prompts to update external links, which always feels a bit uncomfortable.
The great & / and debacle
TLDR – lots of my variables had and
within their names. In some datasets, this was changed to the ampersand.
For example Skye, Lochalsh and Wester Ross
became Skye, Lochalsh & Wester Ross
I need to unify these, and let’s face it, typing “&” is easier, and takes up less space.
Big mistake. All seemed to be going well, until I tried generating a report with Skye, Lochalsh & Wester Ross
as the parameter.
I’m sure someone is laughing at this, thinking, “well of course you can’t have “&”, that’s a special character in html / xml”, but I didn’t know that, being as I’m not an actual super nerd. (Well, OK, I am, but even I have my limits).
Debugging this took me a long time. My first thought was I’d messed up my functions, and then I got some great advice on Twitter, and realised that the markdown file was rendering 100% completely and the failure point was between the .md file and the final Word output.
Reverting all ampersands back to and
( which mean re-running the entire pipeline from scratch) allowed me to generate the elusive final document, and a lesson learned.
Best layout for table widths
Depends a lot on your margins, but instead of relying on autofit()
, this seems to work well :
flextable::width(., width = dim(.)$widths * 6.5 / (flextable::flextable_dim(.)$widths))
Inserting cover images
I’m going to leave that for the next post, which I hope will be published within the next few days..so please look out for the follow up post!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.