Site icon R-bloggers

The Need for paste2 (part I)

[This article was first published on TRinker's R Blog » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is Part I of a multi part blog on the paste2 function…

I recently generated a new paste function that takes an unspecified list of equal length variables (a column) or multiple columns of a data frame  and pastes them together.  First let me thank Dason of Talk Stats for his help in this post that led to the creation of the paste2 function.  First let me convince you of the need for a paste2 function by showing you where the original paste falls short.  Then I’ll introduce to the function and some basics of what it can do.  In Part II of this paste2 blog series I’ll show you a few practical applications I’ve already encountered.

The main idea behind this function is the need to pass an unknown number of columns from a data frame or list and paste them together to generate an uber column that contains all the information of the original columns.  You may say well I think paste already does that.  Not in it’s home grown state it doesn’t.  What’s that prove it?  OK.  Try the following:

paste(CO2[, 1:3], sep=".")                            #1
paste(CO2[, 1:3], collapse=".")                       #2
paste(CO2[,1], CO2[, 2], CO2[, 3], sep=".")           #3
paste(list(CO2[,1], CO2[, 2], CO2[, 3]), sep=".")     #4

What do you get?  Well the third use of paste is the only one that results in pasting the columns together.  Why?  Because we specified the columns being passed to paste and paste is our friend.  If we try a sneak attack with an index of columns paste becomes scared and returns gobly gook.  If you try to be nice and give paste a list of columns he again gives gobly gook.  So what’s the need for pasting an unknown number of columns (either as an indexed data frame or as a list) together?  Often in functions the number of columns passed to paste can’t be specified in advance, hence out problem (I’ll show you more of those specific applications in Part II).

paste2 <- function(multi.columns, sep=".", handle.na=TRUE, trim=TRUE){
    if (trim) multi.columns <- lapply(multi.columns, function(x) {
            gsub("^\\s+|\\s+$", "", x)
        }
    )
    if (!is.data.frame(multi.columns) & is.list(multi.columns)) {
        multi.columns <- do.call('cbind', multi.columns)
      }
    m <- if(handle.na){
                 apply(multi.columns, 1, function(x){
                     if (any(is.na(x))){
                         NA
                     } else {
                         paste(x, collapse = sep)
                     }
                 }
             )   
         } else {
          apply(multi.columns, 1, paste, collapse = sep)
    }
    names(m) <- NULL
    return(m)
}

Now let’s see it in action:

paste2(CO2[, 1:3], sep=".")
paste2(CO2[, 1:3], sep=":")
paste2(list(CO2[,1], CO2[, 2], CO2[, 3]))
#shoot we can paste the whole data set if we want
paste2(CO2)
paste2(mtcars)

In Part II we’ll explore some practical uses of this new function!

Click HERE for a link to a .txt version of paste2


To leave a comment for the author, please follow the link and comment on their blog: TRinker's R Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.