Higher Order Functions in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
Because R is, in part, a functional programming language, the ‘base’ package contains several higher order functions. By higher order functions, I mean functions that take another function as an argument and then do something with that function. If you want to know more about the usefulness of writing higher order functions in general, I’d recommend the classic Structure and Interpretation of Computer Programs and the more recent Higher Order Perl, both of which are freely available online.
The six primary higher order functions built into R are
- Reduce
- Filter
- Map
- Find
- Position
- Negate
I’ll go through these in order, giving some examples using basic number theoretic calculations in something approximating the style of SICP. Hopefully the use of mathematical examples isn’t as unnerving for stats-savvy readers as it can be for many readers of SICP.
Reduce
Reduce
loops over pairs of elements of a vector, performing the same binary operation on all of the pairs along the way. That’s so abstract that it’s probably unclear. For that reason, I think it’s best to work up from a very specific example to the general notion of Reduce
.
Let’s start by assuming that you’ve been writing some horrible piece of code like the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | my.sum <- function(x) { if(length(x) == 1) { return(x[1]) } if (length(x) == 2) { return(x[1] + x[2]) } if (length(x) == 3) { return(x[1] + x[2] + x[3]) } } |
This approach is clearly so specific that it’s virtually unusable. The most obvious fault is that our code is unable to deal with arbitrarily long vectors. That can be easily fixed, though, with a simple loop:
1 2 3 4 5 6 7 8 9 10 11 | my.sum <- function(x) { result <- 0 for (i in 1:length(x)) { result <- result + x[i] } return(result) } |
Reduce
takes another abstraction step beyond the one we just made. Instead of making specific functions that repeatedly perform some set of pre-specified binary operations, Reduce
provides one higher order generalization that takes an arbitrary binary operation as an argument. This lets us rewrite my.sum
in one line:
1 | my.sum <- function (x) {Reduce(`+`, x)} |
If you’re confused by that line, there are two potential sources of magic. First, the name of the addition operation in R is `+`
with backquotes. Second, Reduce
and its relatives assume that the first argument is a function that they will use on the elements of x
.
Once you realize that Reduce
makes this sort of code trivial to write, it’s easy to continue exploiting this approach:
1 2 3 | my.prod <- function (x) {Reduce(`*`, x)} my.factorial <- function (n) {Reduce(`*`, 1:n)} |
Better yet, if you know that Reduce
has other options like accumulate
, which allows you to keep track of the interim results of the Reduce
operations, then you can also make replacements for cumsum
and cumprod
easily:
1 2 3 | my.cumsum <- function (x) {Reduce(`+`, x, accumulate = TRUE)} my.cumprod <- function (x) {Reduce(`*`, x, accumulate = TRUE)} |
Filter
Filter
is even easier to understand than Reduce
: it simply selects the elements of a vector that meet a predicate you pass in as a function:
1 2 | small.even.numbers <- Filter(function (x) {x %% 2 == 0}, 1:10) small.odd.numbers <- Filter(function (x) {x %% 2 == 1}, 1:10) |
We can use Filter
along with some more interesting predicates to write much more complex functions quickly:
1 2 3 | is.divisor <- function(a, x) {x %% a == 0} is.prime <- function (x) {length(Filter(function (a) {is.divisor(a, x)}, 1:x)) == 2} |
And we can even combine Reduce
and Filter
to perform otherwise tedious tasks like enumerating the perfect numbers:
1 2 3 4 5 | proper.divisors <- function (x) {Filter(function (a) {is.divisor(a, x)}, 1:(x - 1))} is.perfect <- function(x) {x == Reduce(`+`, proper.divisors(x))} small.perfect.numbers <- Filter(is.perfect, 1:1000) |
Map
Map
is essentially the same abstraction as the apply
family of functions provides. Try the following to see why I say that:
1 | Map(proper.divisors, 1:5) |
Find
Find
is really a truncated form of Filter
: it locates the first item in a vector that satisfies a predicate. For example, we can find the first prime greater than 1000 like so:
1 | Find(is.prime, 1000:2000) |
Position
Position
is a variant of Find
that provides the index of the element that would be returned by Find
instead of its value:
1 | Position(is.prime, 1000:2000) |
Negate
Negate
simply flips the Boolean sense of an existing predicate. So we can do the following:
1 | is.composite <- Negate(is.prime) |
Using Reduce for Continued Fraction Computations
The following example uses higher order functions to compute the value of a finite continued fraction. The code is lifted straight from the help docs for Reduce
. If you’re not familiar with continued fractions, I’d recommend starting with this Wikipedia article and then reading The Higher Arithmetic (Incidentally, The Higher Arithmetic is the most enjoyable book on mathematics I have ever read.)
1 2 | cfrac <- function(x) Reduce(function(u, v) u + 1 / v, x, right = TRUE) cfrac(c(1, rep(2, 99))) == sqrt(2) |
This example also shows that you can have Reduce
work right to left through a vector when you might need to. In this example, you do need to work in that specific order, because the binary operation is not associative, so moving left to right would produce different results.
How Fast are These Higher Order Functions?
Given their flexibility, one might wonder if these higher order functions lose something in efficiency. Some basic profiling suggests they do, though I think their elegance makes up for small efficiency losses in many contexts.
For example, filtering and selecting elements of a vector are essentially identical, so we can compare these two approaches:
1 2 3 4 5 | natural.numbers <- 1:10000 iterations <- 10 mean(replicate(iterations, system.time(natural.numbers[natural.numbers %% 2 == 0]))) mean(replicate(iterations, system.time(Filter(function (n) {n %% 2 == 0}, natural.numbers)))) |
Similarly, we can compare Map
and lapply
:
1 2 | mean(replicate(iterations, system.time(lapply(1:10000, sqrt)))) mean(replicate(iterations, system.time(Map(sqrt, 1:10000)))) |
And finally we can compare our Reduce
implementation of sum
with the hardcoded one:
1 2 | mean(replicate(iterations, system.time(sum(1:10000)))) mean(replicate(iterations, system.time(Reduce(`+`, 1:10000)))) |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.