Four Filters for Functional (Programming) Friends
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’m part of a local Functional Programming Meetup group which hosts talks, but also coordinates social meetings where we discuss all sorts of FP-related topics including Haskell and other languages. We’ve started running challenges where we all solve a given problem in a language of our choosing then discuss over drinks how they compare.
This month we went with an “easy” problem with a wrinkle – we would solve the ‘strain’ exercise from Exercism (Haskell, Python – your access to these is likely conditional on you being enrolled in that language track) with an extension:
The problem is trivial; the challenge is to solve it in 4 different ways using your language of choice.
The problem itself is given as
Implement the
keep
anddiscard
operation on collections. Given a collection and a predicate on the collection’s elements,keep
returns a new collection containing those elements where the predicate is true, whilediscard
returns a new collection containing those elements where the predicate is false.For example, given the collection of numbers:
1, 2, 3, 4, 5
And the predicate:
“is the number even?”
Then your keep operation should produce:
2, 4
While your discard operation should produce:
1, 3, 5
but with a restriction:
Keep your hands off that filter/reject/whatchamacallit functionality provided by your standard library! Solve this one yourself using other basic tools instead.
I figured it’s a good opportunity to write as I solve it, so here’s my R solutions.
I’ll define a test case so I can try out things as I go
test_vec <- c(1, 2, 3, 4, 5) test_vec ## [1] 1 2 3 4 5
and the predicates related to ‘even’ and ‘odd’ as functions which return TRUE
or FALSE
is_even <- function(x) { x %% 2 == 0 } is_even(7) ## [1] FALSE is_even(8) ## [1] TRUE is_odd <- function(x) { !is_even(x) } is_odd(7) ## [1] TRUE is_odd(8) ## [1] FALSE
Firstly, the restriction doesn’t seem to worry me because when I think of “filter” in R
I immediately think of dplyr::filter()
which works on data.frame
(or tibble
)
objects, and (given the examples) we’re aiming to work with vectors (the problem is stated
the same in several languages, so “collection” is a generalisation).
What about base::Filter()
? The help states
Filter
extracts the elements of a vector for which a predicate (logical) function gives true.
Filter(is_even, test_vec) ## [1] 2 4 Filter(is_odd, test_vec) ## [1] 1 3 5
Yep, that works exactly as I hoped, but is also a built-in “filter” so I can’t use it.
When I think of keep
and discard
I do think of the purrr
functions, and while these, too
do exactly what I want
purrr::keep(test_vec, is_even) ## [1] 2 4 purrr::discard(test_vec, is_even) ## [1] 1 3 5
They’re in a library, so I’m going to say they don’t count.
One of the things I like about the way R does subsetting (via the square-bracket [
which is by itself a function, but requires a matching ]
to satisfy the parser) is that you
can use a logical vector to subset another vector,
c(3, 5, 8, 12)[c(TRUE, FALSE, FALSE, TRUE)] ## [1] 3 12
which means that if I can produce such a logical vector, say, by applying a predicate function, I can do subsetting that way
test_vec[is_even(test_vec)] ## [1] 2 4 test_vec[is_odd(test_vec)] ## [1] 1 3 5
Instead of using is_odd()
I can just negate the logical vector to get the same effect
test_vec[!is_even(test_vec)] ## [1] 1 3 5
I can make those into functions that take a predicate and a vector
keep_1 <- function(f, x) { x[f(x)] } discard_1 <- function(f, x) { x[!f(x)] }
Testing these
keep_1(is_even, test_vec) ## [1] 2 4 discard_1(is_even, test_vec) ## [1] 1 3 5
One down!
One thing to note with this approach is that R is vectorised - I’ve discussed this
a few times on this blog (most recently) - which
means that these predicate functions will gladly take a vector, not just a single value. This
works for the is_even()
function because inside that, the modulo operator %%
is itself
vectorised, so
is_even(c(2, 4, 6, 9, 11, 13)) ## [1] TRUE TRUE TRUE FALSE FALSE FALSE
As I wrote in my previous post, thinking like this just becomes so natural in R that I have to force myself to remember that not every language does that.
It’s also worth mentioning that I’m passing a reference to the function is_even
to our
keep
and discard
functions - that is serving as our predicate because I need a way to
state “is the number even?” which references the number, so I need a function. That doesn’t
have to be a named function, though - it could be an “anonymous” function (a “lambda”) if
I wanted
keep_1(function(z) z %% 2 == 0, test_vec) ## [1] 2 4 discard_1(function(z) z %% 2 == 0, test_vec) ## [1] 1 3 5
I can subset a vector with a logical vector of the same length, specifying whether or not
to include that element, but I can also subset by position (keeping in mind that R is a
1-based language which means the first element is indexed by a 1
- why would any language
do anything different? 😜)
c(3, 5, 8, 12)[c(1, 4)] ## [1] 3 12
The function which()
takes a logical vector and returns which indices are TRUE
which(c(TRUE, FALSE, FALSE, TRUE)) ## [1] 1 4
so I can use this with our predicate to keep elements
keep_2 <- function(f, x) { x[which(f(x))] } keep_2(is_even, test_vec) ## [1] 2 4
However, discarding elements by index doesn’t use a logical negation, it uses a
negative sign (-
)
discard_2 <- function(f, x) { x[-which(f(x))] } discard_2(is_even, test_vec) ## [1] 1 3 5
If you look at the source of Filter()
, you’ll see that I wasn’t far off of exactly
that
Filter ## function (f, x) ## { ## ind <- as.logical(unlist(lapply(x, f))) ## x[which(ind)] ## } ## <bytecode: 0x55da1b321ad8> ## <environment: namespace:base>
but it still counts.
Another option would be to unpack the elements themselves and do some stepwise
comparisons in a loop. For each element el
in the vector x
, test if f(el)
is TRUE
,
and if it is, concatenate el
to the end of the accumulating result
vector
keep_3 <- function(f, x) { result <- c() for (el in x) { if (f(el)) { result <- c(result, el) } } result } keep_3(is_even, test_vec) ## [1] 2 4 discard_3 <- function(f, x) { result <- c() for (el in x) { if (!f(el)) { result <- c(result, el) } } result } discard_3(is_even, test_vec) ## [1] 1 3 5
Of course, this approach is a Bad Idea™ in general but I’m not optimising anything here. This approach does have the advantage that it isn’t relying on R’s vectorised capabilities, since each element is passed to the predicate function individually, so if I did have a non-vectorized predicate function, this would still work.
I really want a “weird” way to do this. R has plenty of weird to go around, but since I’ve been learning some Haskell, and the challenge originally referenced the Haskell solution, what if I code a Haskell-esque solution?
Haskell makes good use of recursive functions. Any loop can be written as a recursion
(and vice-versa) so the previous solution is a good starting point. First, I define
a base case; if I run out of numbers to process, return NULL
. A convenient feature
of R vectors is that NULL
s are dropped
c(1, 2, NULL, 3, 4, NULL, 5) ## [1] 1 2 3 4 5
Otherwise, I can take the first value in the vector and test it with the predicate. If it
returns TRUE
I can append it to what I’ve calculated so far, and recursively
call the function again with the rest of the vector. That could look like
keep_4 <- function(f, x) { if (!length(x)) return(NULL) if (f(x[1])) { return(c(x[1], Recall(f, x[-1]))) } else { return(Recall(f, x[-1])) } } keep_4(is_even, test_vec) ## [1] 2 4
Some interesting points about this: the Recall()
function is nice for defining a recursive
function. I could have used keep_4
there, but the advantage of this implementation is
that I can rename the function and it still works as expected
keep_4_also <- keep_4 rm("keep_4") keep_4_also(is_even, test_vec) ## [1] 2 4
If I had explicitly referenced keep_4
inside itself, that recursion would fail with this
renaming.
The negative subsetting works as described above; x[-1]
means “not including the first
element”. Lastly, testing if (!length(x))
works because 0
can be coerced to FALSE
and
any other value to TRUE
, so if the length
of x
is not 0
, this condition is met.
The discarding variant is similar, just with the two returns()
around the other way, or
discard_4 <- function(f, x) { if (!length(x)) return(NULL) if (!f(x[1])) { return(c(x[1], Recall(f, x[-1]))) } else { return(Recall(f, x[-1])) } } discard_4(is_even, test_vec) ## [1] 1 3 5
There we go; 4 hand-coded implementations of keep
and discard
in R.
Can you think of another that doesn’t use Filter()
or an external library? Let me
know in the comments below or on Mastodon. I’m
looking forward to seeing how people solved this in other languages.
devtools::session_info()
## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.1.2 (2021-11-01) ## os Pop!_OS 22.04 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_AU.UTF-8 ## ctype en_AU.UTF-8 ## tz Australia/Adelaide ## date 2023-08-30 ## pandoc 3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date (UTC) lib source ## blogdown 1.17 2023-05-16 [1] CRAN (R 4.1.2) ## bookdown 0.29 2022-09-12 [1] CRAN (R 4.1.2) ## bslib 0.5.0 2023-06-09 [3] CRAN (R 4.3.1) ## cachem 1.0.8 2023-05-01 [3] CRAN (R 4.3.0) ## callr 3.7.3 2022-11-02 [3] CRAN (R 4.2.2) ## cli 3.6.1 2023-03-23 [3] CRAN (R 4.2.3) ## crayon 1.5.2 2022-09-29 [3] CRAN (R 4.2.1) ## devtools 2.4.5 2022-10-11 [1] CRAN (R 4.1.2) ## digest 0.6.33 2023-07-07 [3] CRAN (R 4.3.1) ## ellipsis 0.3.2 2021-04-29 [3] CRAN (R 4.1.1) ## evaluate 0.21 2023-05-05 [3] CRAN (R 4.3.0) ## fastmap 1.1.1 2023-02-24 [3] CRAN (R 4.2.2) ## fs 1.6.3 2023-07-20 [3] CRAN (R 4.3.1) ## glue 1.6.2 2022-02-24 [3] CRAN (R 4.2.0) ## htmltools 0.5.6 2023-08-10 [3] CRAN (R 4.3.1) ## htmlwidgets 1.5.4 2021-09-08 [1] CRAN (R 4.1.2) ## httpuv 1.6.6 2022-09-08 [1] CRAN (R 4.1.2) ## jquerylib 0.1.4 2021-04-26 [3] CRAN (R 4.1.2) ## jsonlite 1.8.7 2023-06-29 [3] CRAN (R 4.3.1) ## knitr 1.43 2023-05-25 [3] CRAN (R 4.3.0) ## later 1.3.0 2021-08-18 [1] CRAN (R 4.1.2) ## lifecycle 1.0.3 2022-10-07 [3] CRAN (R 4.2.1) ## magrittr 2.0.3 2022-03-30 [3] CRAN (R 4.2.0) ## memoise 2.0.1 2021-11-26 [3] CRAN (R 4.2.0) ## mime 0.12 2021-09-28 [3] CRAN (R 4.2.0) ## miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.1.2) ## pkgbuild 1.4.0 2022-11-27 [1] CRAN (R 4.1.2) ## pkgload 1.3.0 2022-06-27 [1] CRAN (R 4.1.2) ## prettyunits 1.1.1 2020-01-24 [3] CRAN (R 4.0.1) ## processx 3.8.2 2023-06-30 [3] CRAN (R 4.3.1) ## profvis 0.3.7 2020-11-02 [1] CRAN (R 4.1.2) ## promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.1.2) ## ps 1.7.5 2023-04-18 [3] CRAN (R 4.3.0) ## purrr 1.0.1 2023-01-10 [1] CRAN (R 4.1.2) ## R6 2.5.1 2021-08-19 [3] CRAN (R 4.2.0) ## Rcpp 1.0.9 2022-07-08 [1] CRAN (R 4.1.2) ## remotes 2.4.2 2021-11-30 [1] CRAN (R 4.1.2) ## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.1.2) ## rmarkdown 2.23 2023-07-01 [3] CRAN (R 4.3.1) ## rstudioapi 0.15.0 2023-07-07 [3] CRAN (R 4.3.1) ## sass 0.4.7 2023-07-15 [3] CRAN (R 4.3.1) ## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2) ## shiny 1.7.2 2022-07-19 [1] CRAN (R 4.1.2) ## stringi 1.7.12 2023-01-11 [3] CRAN (R 4.2.2) ## stringr 1.5.0 2022-12-02 [1] CRAN (R 4.1.2) ## urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.1.2) ## usethis 2.1.6 2022-05-25 [1] CRAN (R 4.1.2) ## vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.1.2) ## xfun 0.40 2023-08-09 [3] CRAN (R 4.3.1) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.1.2) ## yaml 2.3.7 2023-01-23 [3] CRAN (R 4.2.2) ## ## [1] /home/jono/R/x86_64-pc-linux-gnu-library/4.1 ## [2] /usr/local/lib/R/site-library ## [3] /usr/lib/R/site-library ## [4] /usr/lib/R/library ## ## ──────────────────────────────────────────────────────────────────────────────
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.