Site icon R-bloggers

Vectorizing functions in R is easy

[This article was first published on Roman Luštrik not Inc. - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Imagine you have a function that only takes one argument, but you would really like to work on a vector of values. A short example on how function Vectorize() can accomplish this. Let’s say we have a data.frame

xy <- data.frame(sample = c("C_pre_sample1", "C_post_sample1", "T_pre_sample2",
                            "T_post_sample2", "NA_pre_sample1"),
                 value = runif(5))

#           sample     value
# 1  C_pre_sample1 0.3048032
# 2 C_post_sample1 0.3487163
# 3  T_pre_sample2 0.3359707
# 4 T_post_sample2 0.6698358
# 5 NA_pre_sample1 0.9490707

and you want to subset only samples that start with C_pre or T_pre. Of course you can construct a nice regular expression, implement an anonymouse function using lapply/sapply or use one of those fancy tidyverse functions.

A long winded way would be to find matches using regular expression for each level, combine them and subset. This is for pedagogical reasons, so please bare with me.

i.ind <- do.call(cbind, list(
  grepl(pattern = "^C_pre", x = xy$sample),
  grepl(pattern = "^T_pre", x = xy$sample)
))

i.ind
#       [,1]  [,2]
# [1,]  TRUE FALSE
# [2,] FALSE FALSE
# [3,] FALSE  TRUE
# [4,] FALSE FALSE
# [5,] FALSE FALSE

# Find those rows in `xy` that have at least one TRUE and use that to subset the
# data.frame.
xy[rowSums(i.ind) > 0, ]

#          sample     value
# 1 C_pre_sample1 0.3048032
# 3 T_pre_sample2 0.3359707

The same can be achieved using a vectorized version of the grepl function. We designate which argument exactly is being vectorized, in our case pattern because that’s the argument that is varying.

vgrepl <- Vectorize(grepl, vectorize.args = "pattern")

Here we use function Vectorize and we tell it to vectorize argument pattern. What this will do is run the grepl function for any element of the vector we pass in, just like we did in the i.ind objects a few lines above.

This would be an equivalent of doing it using an anonymouse function

tmp <- sapply(c("^C_pre", "^T_pre"), FUN = function(pt, input) {
  grepl(pt, x = input)
}, input = xy$sample)

tmp
#      ^C_pre ^T_pre
# [1,]   TRUE  FALSE
# [2,]  FALSE  FALSE
# [3,]  FALSE   TRUE
# [4,]  FALSE  FALSE
# [5,]  FALSE  FALSE

While this can be somewhat verbose, you can use vgrepl as you would use grepl, with the minor detail that you pass a whole vector to pattern instead of a single regular expression.

i.vec <- vgrepl(pattern = c("^C_pre", "^T_pre"), x = xy$sample)
#      ^C_pre ^T_pre
# [1,]   TRUE  FALSE
# [2,]  FALSE  FALSE
# [3,]  FALSE   TRUE
# [4,]  FALSE  FALSE
# [5,]  FALSE  FALSE

xy[rowSums(i.vec) > 0, ]

#          sample     value
# 1 C_pre_sample1 0.3048032
# 3 T_pre_sample2 0.3359707

To leave a comment for the author, please follow the link and comment on their blog: Roman Luštrik not Inc. - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.