Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Imagine you have a function that only takes one argument, but you would really like to work on a vector of values. A short example on how function Vectorize()
can accomplish this. Let’s say we have a data.frame
xy <- data.frame(sample = c("C_pre_sample1", "C_post_sample1", "T_pre_sample2", "T_post_sample2", "NA_pre_sample1"), value = runif(5)) # sample value # 1 C_pre_sample1 0.3048032 # 2 C_post_sample1 0.3487163 # 3 T_pre_sample2 0.3359707 # 4 T_post_sample2 0.6698358 # 5 NA_pre_sample1 0.9490707
and you want to subset only samples that start with C_pre
or T_pre
. Of course you can construct a nice regular expression, implement an anonymouse function using lapply
/sapply
or use one of those fancy tidyverse functions.
A long winded way would be to find matches using regular expression for each level, combine them and subset. This is for pedagogical reasons, so please bare with me.
i.ind <- do.call(cbind, list( grepl(pattern = "^C_pre", x = xy$sample), grepl(pattern = "^T_pre", x = xy$sample) )) i.ind # [,1] [,2] # [1,] TRUE FALSE # [2,] FALSE FALSE # [3,] FALSE TRUE # [4,] FALSE FALSE # [5,] FALSE FALSE # Find those rows in `xy` that have at least one TRUE and use that to subset the # data.frame. xy[rowSums(i.ind) > 0, ] # sample value # 1 C_pre_sample1 0.3048032 # 3 T_pre_sample2 0.3359707
The same can be achieved using a vectorized version of the grepl
function. We designate which argument exactly is being vectorized, in our case pattern
because that’s the argument that is varying.
vgrepl <- Vectorize(grepl, vectorize.args = "pattern")
Here we use function Vectorize
and we tell it to vectorize argument pattern
. What this will do is run the grepl
function for any element of the vector we pass in, just like we did in the i.ind
objects a few lines above.
This would be an equivalent of doing it using an anonymouse function
tmp <- sapply(c("^C_pre", "^T_pre"), FUN = function(pt, input) { grepl(pt, x = input) }, input = xy$sample) tmp # ^C_pre ^T_pre # [1,] TRUE FALSE # [2,] FALSE FALSE # [3,] FALSE TRUE # [4,] FALSE FALSE # [5,] FALSE FALSE
While this can be somewhat verbose, you can use vgrepl
as you would use grepl
, with the minor detail that you pass a whole vector to pattern
instead of a single regular expression.
i.vec <- vgrepl(pattern = c("^C_pre", "^T_pre"), x = xy$sample) # ^C_pre ^T_pre # [1,] TRUE FALSE # [2,] FALSE FALSE # [3,] FALSE TRUE # [4,] FALSE FALSE # [5,] FALSE FALSE xy[rowSums(i.vec) > 0, ] # sample value # 1 C_pre_sample1 0.3048032 # 3 T_pre_sample2 0.3359707
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.