Site icon R-bloggers

Four Filters for Functional (Programming) Friends

[This article was first published on rstats on Irregularly Scheduled Programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m part of a local Functional Programming Meetup group which hosts talks, but also coordinates social meetings where we discuss all sorts of FP-related topics including Haskell and other languages. We’ve started running challenges where we all solve a given problem in a language of our choosing then discuss over drinks how they compare.

This month we went with an “easy” problem with a wrinkle – we would solve the ‘strain’ exercise from Exercism (Haskell, Python – your access to these is likely conditional on you being enrolled in that language track) with an extension:

The problem is trivial; the challenge is to solve it in 4 different ways using your language of choice.

The problem itself is given as

Implement the keep and discard operation on collections. Given a collection and a predicate on the collection’s elements, keep returns a new collection containing those elements where the predicate is true, while discard returns a new collection containing those elements where the predicate is false.

For example, given the collection of numbers:

1, 2, 3, 4, 5

And the predicate:

“is the number even?”

Then your keep operation should produce:

2, 4

While your discard operation should produce:

1, 3, 5

but with a restriction:

Keep your hands off that filter/reject/whatchamacallit functionality provided by your standard library! Solve this one yourself using other basic tools instead.

I figured it’s a good opportunity to write as I solve it, so here’s my R solutions.

I’ll define a test case so I can try out things as I go

test_vec <- c(1, 2, 3, 4, 5)
test_vec
## [1] 1 2 3 4 5

and the predicates related to ‘even’ and ‘odd’ as functions which return TRUE or FALSE

is_even <- function(x) {
  x %% 2 == 0
}

is_even(7)
## [1] FALSE
is_even(8)
## [1] TRUE
is_odd <- function(x) {
  !is_even(x)
}

is_odd(7)
## [1] TRUE
is_odd(8)
## [1] FALSE

Firstly, the restriction doesn’t seem to worry me because when I think of “filter” in R I immediately think of dplyr::filter() which works on data.frame (or tibble) objects, and (given the examples) we’re aiming to work with vectors (the problem is stated the same in several languages, so “collection” is a generalisation).

What about base::Filter()? The help states

Filter extracts the elements of a vector for which a predicate (logical) function gives true.

Filter(is_even, test_vec)
## [1] 2 4
Filter(is_odd, test_vec)
## [1] 1 3 5

Yep, that works exactly as I hoped, but is also a built-in “filter” so I can’t use it.

When I think of keep and discard I do think of the purrr functions, and while these, too do exactly what I want

purrr::keep(test_vec, is_even)
## [1] 2 4
purrr::discard(test_vec, is_even)
## [1] 1 3 5

They’re in a library, so I’m going to say they don’t count.

One of the things I like about the way R does subsetting (via the square-bracket [ which is by itself a function, but requires a matching ] to satisfy the parser) is that you can use a logical vector to subset another vector,

c(3, 5, 8, 12)[c(TRUE, FALSE, FALSE, TRUE)]
## [1]  3 12

which means that if I can produce such a logical vector, say, by applying a predicate function, I can do subsetting that way

test_vec[is_even(test_vec)]
## [1] 2 4
test_vec[is_odd(test_vec)]
## [1] 1 3 5

Instead of using is_odd() I can just negate the logical vector to get the same effect

test_vec[!is_even(test_vec)]
## [1] 1 3 5

I can make those into functions that take a predicate and a vector

keep_1 <- function(f, x) {
  x[f(x)]
}

discard_1 <- function(f, x) {
  x[!f(x)]
}

Testing these

keep_1(is_even, test_vec)
## [1] 2 4
discard_1(is_even, test_vec)
## [1] 1 3 5

One down!

One thing to note with this approach is that R is vectorised – I’ve discussed this a few times on this blog (most recently) – which means that these predicate functions will gladly take a vector, not just a single value. This works for the is_even() function because inside that, the modulo operator %% is itself vectorised, so

is_even(c(2, 4, 6, 9, 11, 13))
## [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

As I wrote in my previous post, thinking like this just becomes so natural in R that I have to force myself to remember that not every language does that.

It’s also worth mentioning that I’m passing a reference to the function is_even to our keep and discard functions – that is serving as our predicate because I need a way to state “is the number even?” which references the number, so I need a function. That doesn’t have to be a named function, though – it could be an “anonymous” function (a “lambda”) if I wanted

keep_1(function(z) z %% 2 == 0, test_vec)
## [1] 2 4
discard_1(function(z) z %% 2 == 0, test_vec)
## [1] 1 3 5

I can subset a vector with a logical vector of the same length, specifying whether or not to include that element, but I can also subset by position (keeping in mind that R is a 1-based language which means the first element is indexed by a 1 – why would any language do anything different? 😜)

c(3, 5, 8, 12)[c(1, 4)]
## [1]  3 12

The function which() takes a logical vector and returns which indices are TRUE

which(c(TRUE, FALSE, FALSE, TRUE))
## [1] 1 4

so I can use this with our predicate to keep elements

keep_2 <- function(f, x) {
  x[which(f(x))]
}

keep_2(is_even, test_vec)
## [1] 2 4

However, discarding elements by index doesn’t use a logical negation, it uses a negative sign (-)

discard_2 <- function(f, x) {
  x[-which(f(x))]
}

discard_2(is_even, test_vec)
## [1] 1 3 5

If you look at the source of Filter(), you’ll see that I wasn’t far off of exactly that

Filter
## function (f, x) 
## {
##     ind <- as.logical(unlist(lapply(x, f)))
##     x[which(ind)]
## }
## <bytecode: 0x55da1b321ad8>
## <environment: namespace:base>

but it still counts.

Another option would be to unpack the elements themselves and do some stepwise comparisons in a loop. For each element el in the vector x, test if f(el) is TRUE, and if it is, concatenate el to the end of the accumulating result vector

keep_3 <- function(f, x) {
  result <- c()
  for (el in x) {
    if (f(el)) {
      result <- c(result, el)
    }
  }
  result
}

keep_3(is_even, test_vec)
## [1] 2 4
discard_3 <- function(f, x) {
  result <- c()
  for (el in x) {
    if (!f(el)) {
      result <- c(result, el)
    }
  }
  result
}

discard_3(is_even, test_vec)
## [1] 1 3 5

Of course, this approach is a Bad Idea™ in general but I’m not optimising anything here. This approach does have the advantage that it isn’t relying on R’s vectorised capabilities, since each element is passed to the predicate function individually, so if I did have a non-vectorized predicate function, this would still work.

I really want a “weird” way to do this. R has plenty of weird to go around, but since I’ve been learning some Haskell, and the challenge originally referenced the Haskell solution, what if I code a Haskell-esque solution?

Haskell makes good use of recursive functions. Any loop can be written as a recursion (and vice-versa) so the previous solution is a good starting point. First, I define a base case; if I run out of numbers to process, return NULL. A convenient feature of R vectors is that NULLs are dropped

c(1, 2, NULL, 3, 4, NULL, 5)
## [1] 1 2 3 4 5

Otherwise, I can take the first value in the vector and test it with the predicate. If it returns TRUE I can append it to what I’ve calculated so far, and recursively call the function again with the rest of the vector. That could look like

keep_4 <- function(f, x) {
  if (!length(x)) return(NULL)
  if (f(x[1])) {
    return(c(x[1], Recall(f, x[-1])))
  } else {
    return(Recall(f, x[-1]))
  }
}

keep_4(is_even, test_vec)
## [1] 2 4

Some interesting points about this: the Recall() function is nice for defining a recursive function. I could have used keep_4 there, but the advantage of this implementation is that I can rename the function and it still works as expected

keep_4_also <- keep_4
rm("keep_4")

keep_4_also(is_even, test_vec)
## [1] 2 4

If I had explicitly referenced keep_4 inside itself, that recursion would fail with this renaming.

The negative subsetting works as described above; x[-1] means “not including the first element”. Lastly, testing if (!length(x)) works because 0 can be coerced to FALSE and any other value to TRUE, so if the length of x is not 0, this condition is met.

The discarding variant is similar, just with the two returns() around the other way, or

discard_4 <- function(f, x) {
  if (!length(x)) return(NULL)
  if (!f(x[1])) {
    return(c(x[1], Recall(f, x[-1])))
  } else {
    return(Recall(f, x[-1]))
  }
}

discard_4(is_even, test_vec)
## [1] 1 3 5

There we go; 4 hand-coded implementations of keep and discard in R.

Can you think of another that doesn’t use Filter() or an external library? Let me know in the comments below or on Mastodon. I’m looking forward to seeing how people solved this in other languages.


< details> < summary> devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.1.2 (2021-11-01)
##  os       Pop!_OS 22.04 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  en_AU.UTF-8
##  ctype    en_AU.UTF-8
##  tz       Australia/Adelaide
##  date     2023-08-30
##  pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date (UTC) lib source
##  blogdown      1.17    2023-05-16 [1] CRAN (R 4.1.2)
##  bookdown      0.29    2022-09-12 [1] CRAN (R 4.1.2)
##  bslib         0.5.0   2023-06-09 [3] CRAN (R 4.3.1)
##  cachem        1.0.8   2023-05-01 [3] CRAN (R 4.3.0)
##  callr         3.7.3   2022-11-02 [3] CRAN (R 4.2.2)
##  cli           3.6.1   2023-03-23 [3] CRAN (R 4.2.3)
##  crayon        1.5.2   2022-09-29 [3] CRAN (R 4.2.1)
##  devtools      2.4.5   2022-10-11 [1] CRAN (R 4.1.2)
##  digest        0.6.33  2023-07-07 [3] CRAN (R 4.3.1)
##  ellipsis      0.3.2   2021-04-29 [3] CRAN (R 4.1.1)
##  evaluate      0.21    2023-05-05 [3] CRAN (R 4.3.0)
##  fastmap       1.1.1   2023-02-24 [3] CRAN (R 4.2.2)
##  fs            1.6.3   2023-07-20 [3] CRAN (R 4.3.1)
##  glue          1.6.2   2022-02-24 [3] CRAN (R 4.2.0)
##  htmltools     0.5.6   2023-08-10 [3] CRAN (R 4.3.1)
##  htmlwidgets   1.5.4   2021-09-08 [1] CRAN (R 4.1.2)
##  httpuv        1.6.6   2022-09-08 [1] CRAN (R 4.1.2)
##  jquerylib     0.1.4   2021-04-26 [3] CRAN (R 4.1.2)
##  jsonlite      1.8.7   2023-06-29 [3] CRAN (R 4.3.1)
##  knitr         1.43    2023-05-25 [3] CRAN (R 4.3.0)
##  later         1.3.0   2021-08-18 [1] CRAN (R 4.1.2)
##  lifecycle     1.0.3   2022-10-07 [3] CRAN (R 4.2.1)
##  magrittr      2.0.3   2022-03-30 [3] CRAN (R 4.2.0)
##  memoise       2.0.1   2021-11-26 [3] CRAN (R 4.2.0)
##  mime          0.12    2021-09-28 [3] CRAN (R 4.2.0)
##  miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.1.2)
##  pkgbuild      1.4.0   2022-11-27 [1] CRAN (R 4.1.2)
##  pkgload       1.3.0   2022-06-27 [1] CRAN (R 4.1.2)
##  prettyunits   1.1.1   2020-01-24 [3] CRAN (R 4.0.1)
##  processx      3.8.2   2023-06-30 [3] CRAN (R 4.3.1)
##  profvis       0.3.7   2020-11-02 [1] CRAN (R 4.1.2)
##  promises      1.2.0.1 2021-02-11 [1] CRAN (R 4.1.2)
##  ps            1.7.5   2023-04-18 [3] CRAN (R 4.3.0)
##  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.1.2)
##  R6            2.5.1   2021-08-19 [3] CRAN (R 4.2.0)
##  Rcpp          1.0.9   2022-07-08 [1] CRAN (R 4.1.2)
##  remotes       2.4.2   2021-11-30 [1] CRAN (R 4.1.2)
##  rlang         1.1.1   2023-04-28 [1] CRAN (R 4.1.2)
##  rmarkdown     2.23    2023-07-01 [3] CRAN (R 4.3.1)
##  rstudioapi    0.15.0  2023-07-07 [3] CRAN (R 4.3.1)
##  sass          0.4.7   2023-07-15 [3] CRAN (R 4.3.1)
##  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.1.2)
##  shiny         1.7.2   2022-07-19 [1] CRAN (R 4.1.2)
##  stringi       1.7.12  2023-01-11 [3] CRAN (R 4.2.2)
##  stringr       1.5.0   2022-12-02 [1] CRAN (R 4.1.2)
##  urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.1.2)
##  usethis       2.1.6   2022-05-25 [1] CRAN (R 4.1.2)
##  vctrs         0.6.3   2023-06-14 [1] CRAN (R 4.1.2)
##  xfun          0.40    2023-08-09 [3] CRAN (R 4.3.1)
##  xtable        1.8-4   2019-04-21 [1] CRAN (R 4.1.2)
##  yaml          2.3.7   2023-01-23 [3] CRAN (R 4.2.2)
## 
##  [1] /home/jono/R/x86_64-pc-linux-gnu-library/4.1
##  [2] /usr/local/lib/R/site-library
##  [3] /usr/lib/R/site-library
##  [4] /usr/lib/R/library
## 
## ──────────────────────────────────────────────────────────────────────────────


To leave a comment for the author, please follow the link and comment on their blog: rstats on Irregularly Scheduled Programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version