Cleaning up tables

Posted on May 15, 2018 by R on Abhijit Dasgupta in R bloggers | 0 Comments

[This article was first published on R on Abhijit Dasgupta, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Context

One of things I have to do quite often is create tables for papers and presentations. Often the “Table 1” of a paper has descriptives about the study, broken down by subgroups. For presentation purposes, it doesn’t look good (to me, at least) that the name of each subgroup be repeated down one column of the table.

One way to deal with this is, of course, by hand. Save the table as a CSV or Excel file, open it up in your favorite spreadsheet program, and prettify things. But, of course, this being a R blog, I wanted to create a function that would fix this. I’ve created hack-y functions for this before, but a neat trick pointed out here gave me an idea for a more elegant solution. It also meant I had to use the tidyeval paradigm a bit, which I figured I should at least become familiar with.

Here’s what I want to do

Take a table like this:

Location	Gender	values
Rural	Male	32.74
Rural	Female	25.18
Urban	Male	40.48
Urban	Female	25.28

to something like this:

Location	Gender	values
Rural	Male	32.74
	Female	25.18
Urban	Male	40.48
	Female	25.28

The point is that the first column has repeating values, and I just want the first row of the cluster of rows corresponding to Rural and Urban to have text, the rest being blank. I find this a cleaner look and more typical of tables I see in papers.

This is purely for presentation purposes.I would never do this for data frames I’ll still analyze, since the blank cells screw up things. Of course this could be fixed easily using last value carried forward imputation on the column.

A solution

I created this simple function to do this for a single column within a magrittr pipeline:

clean_col <- function(x, colvar){
  require(dplyr)
  colv <- enquo(colvar)
  x %>% 
    group_by(!!colv) %>% 
      mutate(rown = row_number()) %>% 
    ungroup() %>% 
    mutate_at(vars(!!colv), funs(ifelse(rown > 1, '', .))) %>% 
    select (-rown)

The first thing to note here is that I’m using quosures and quasiquotation to allow the pipeline to work with the function’s inputs, specifically the column name, which is provided as an unquoted name. Admittedly this was done without much understanding, following examples on Edwin Thoen’s excellent blog.

The second thing was the use of the dummy rown column to identify the first row of each cluster of rows defined by the variable colvar. This was inspired by this blog I read through R-Bloggers yesterday. This trick allowed me to easily “blank out” the appropriate cells in the colvar column.

Desirable updates

There are two directions I want to take this, but I don’t understand tidyeval or functions with variable numbers of arguments well enough yet to do it. The simpler extension is to do the same process using two or more columns, rather than one column. For example, taking

Location	Gender	AgeGrp	DeathRate
Rural	Male	50-54	11.7
Rural	Male	55-59	18.1
Rural	Male	60-64	26.9
Rural	Male	65-69	41.0
Rural	Male	70-74	66.0
Rural	Female	50-54	8.7
Rural	Female	55-59	11.7
Rural	Female	60-64	20.3
Rural	Female	65-69	30.9
Rural	Female	70-74	54.3
Urban	Male	50-54	15.4
Urban	Male	55-59	24.3
Urban	Male	60-64	37.0
Urban	Male	65-69	54.6
Urban	Male	70-74	71.1
Urban	Female	50-54	8.4
Urban	Female	55-59	13.6
Urban	Female	60-64	19.3
Urban	Female	65-69	35.1
Urban	Female	70-74	50.0

Location	Gender	AgeGrp	DeathRate
Rural	Male	50-54	11.7
		55-59	18.1
		60-64	26.9
		65-69	41.0
		70-74	66.0
Rural	Female	50-54	8.7
		55-59	11.7
		60-64	20.3
		65-69	30.9
		70-74	54.3
Urban	Male	50-54	15.4
		55-59	24.3
		60-64	37.0
		65-69	54.6
		70-74	71.1
Urban	Female	50-54	8.4
		55-59	13.6
		60-64	19.3
		65-69	35.1
		70-74	50.0

The second extension would be to create truly nested row labels, like this:

Location	Gender	AgeGrp	DeathRate
Rural	Male	50-54	11.7
		55-59	18.1
		60-64	26.9
		65-69	41.0
		70-74	66.0
	Female	50-54	8.7
		55-59	11.7
		60-64	20.3
		65-69	30.9
		70-74	54.3
Urban	Male	50-54	15.4
		55-59	24.3
		60-64	37.0
		65-69	54.6
		70-74	71.1
	Female	50-54	8.4
		55-59	13.6
		60-64	19.3
		65-69	35.1
		70-74	50.0

I can create these on a case-by-case basis, but I’m not sure how to do this in a function, yet. Looking forward to comments.

Update

I can do the first example with the following code (based on Edwin Thoen’s blog, again):

clean_col <- function(x, ...){
  require(dplyr)
  colvs <- quos(...)
  x %>% group_by(!!!colvs) %>% mutate(rown = row_number()) %>% ungroup() %>% 
    mutate_at(vars(!!!colvs), funs(ifelse(rown > 1, '', .))) %>% 
    select (-rown)
}

Update 2

The second example can be solved by extracting elements of quosures, which are essentially a list.

clean_cols <- function(x, ...){
  colvs = quos(...)
  x <- x %>% 
    mutate_at(vars(!!!colv),as.character) %>% # make sure columns are characters
    arrange(!!!colv) # Ensure right order of columns
  for(i in 1:length(colvs)){
    rowvar <-  rlang::sym(paste0('rown',i)) # Create dummy
    x <- x %>% group_by(!!!colvs[1:i]) %>% mutate(!!rowvar := row_number()) %>% 
      ungroup()
  }
  for(i in 1:length(colvs)){
    rowvar <- rlang::sym(paste0('rown',i))
    x <- x %>% mutate_at(vars(!!colvs[[i]]), funs(ifelse(!!rowvar > 1, '', .)))
  }
  x <- x %>% select(-starts_with('rown')) # remove dummies
  return(x)
}

Update 3

Thanks to RAR for his comment. The clean_cols function can be greatly simplified using his suggestion:

clean_cols <- function(x, ...){
  colv = quos(...)
  x <- x %>% 
    mutate_at(vars(!!!colv),as.character) %>% # make sure columns are characters
    arrange(!!!colv) # Ensure right order of columns
  for(ii in rev(seq_along(colv))){
    indx = duplicated(select(x, !!!colv[ii:1]))
    x <- x %>% mutate_at(vars(!!colv[[ii]]), funs(ifelse(indx, '', .)))
  }
  return(x)
}

To leave a comment for the author, please follow the link and comment on their blog: R on Abhijit Dasgupta.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Cleaning up tables

Context

Here’s what I want to do

A solution

Desirable updates

Update

Update 2

Update 3

Related

Context

Here’s what I want to do

A solution

Desirable updates

Update

Update 2

Update 3

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)