Iteration and closures in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I recently read an interesting thread on unexpected behavior in R
when creating a list of functions in a loop or iteration. The issue is solved, but I am going to take the liberty to try and re-state and slow down the discussion of the problem (and fix) for clarity.
The issue is: are references or values captured during iteration?
Many users expect values to be captured. Most programming language implementations capture variables or references (leading to strange aliasing issues). It is confusing (especially in R, which pushes so far in the direction of value oriented semantics) and best demonstrated with concrete examples.
Please read on for a some of the history and future of this issue.
for
loops
Consider the following code run in R version 3.3.2 (2016-10-31)
:
functionsFor <- vector(2, mode='list') for(x in 1:2) { functionsFor[[x]] <- function() return(x) } functionsFor[[1]]() # [1] 2
In real applications the functions would take additional arguments and perform calculations involving both the “partially applied” x
and these future arguments. Obviously if we just wanted values we would not use functions. However, this trivial example is much simpler (except for the feeling it is silly) than a substantial application. The notation gets confusing even as we stand. But partial application (binding values into functions) is a common functional programming pattern (which happens to not always interact well with iteration).
Notice the answer printed is 2 (not 1).
This is because all the functions created in the loop captured a closure or reference to the same variable x
(which is 2 at the end of the loop). The functions did not capture the value x
had when the functions were created. We can confirm this by moving x
around by hand, as we show below.
x <- 4 functionsFor[[1]]() # [1] 4
This is a well know language design issue.
- C# had this issue (and is fairly unique in having eventually fixed it!).
- F# has the issue.
- Go has the issue.
Trying to work-around it
The more complicated examples referenced in the thread are variations of the standard work-around: build a function factory so each function has a different closure (the new closures being the execution environments of each factory invocation). That code looks like the following:
functionsFor2 <- vector(2, mode='list') for(x in 1:2) { functionsFor2[[x]] <- (function(x) { return(function() return(x)) })(x) } functionsFor2[[1]]() # [1] 2
The outer function (which gets called) is called the factory and is trivial (we are only using it to get new environments). The inner function is our example, which in the real world would take additional arguments and perform calculations involving these arguemnts in addition to x
.
Notice the “fix” did not work. There is more than one problem lurking, and this is why so many experienced functional programmers are surprised by the behavior (despite probably having experience in many of the other functional languages we have mentioned). R
“functions” are different than many current languages in that they have semantics closer to what Lisp
called an fexpr
. In particular arguments are subject to “lazy evaluation” (a feature R
implements by a bookeeping process called “promises“).
So in addition to the (probably expected) unwanted shared closure issue, we have a lazy evaluation issue. The complete fix involves both introducing new closures (by the using the function factory’s execution closure) and forcing evaluation in these new environments. We show the code below:
functionsFor3 <- vector(2, mode='list') for(x in 1:2) { functionsFor3[[x]] <- (function(x) { force(x) return(function() return(x)) })(x) } functionsFor3[[1]]() # [1] 1
Lazy evaluation is a fairly rare language feature (most famously used in Haskell
), so it is not always everybody’s mind. R
has lazy evaluation a number of places (function arguments and dplyr
pipelines and data-structures being some of the most prominent uses).
lapply
and purrr::map
I’ve taught this issue for years in our advanced R
-programming workshops.
One thing I didn’t know is: R
fixed this issue for base::lapply()
. Consider the following code:
functionsL <- lapply(1:2, function(x) { function() return(x) }) functionsL[[1]]() # [1] 1
Apparently lapply
used to have the problem and was fixed by the time we got to R 3.2
.
Coming back to the original thread, the current CRAN
release of purrr
(0.2.2
) also has the reference behavior, as we can see below:
functionsM <- purrr::map(1:2, function(x) { function() return(x) }) functionsM[[1]]() # [1] 2
Apparently this is scheduled for a fix.
Though, there is no way purrr::map()
can behave the same as both for(){}
and lapply()
as the two currently have different behavior.
Conclusion
Lazy evaluation can increase complexity as it makes it less obvious to the programmer when something will be executed and increases the number of possible interactions the programmer can experience (as it is not determined when code will run, so one can not always know the state of the world it will run in).
My opinion is: lazy evaluation should be used sparingly in R
, and only where it is trading non-determinism for some benefit. I would also point out that lazy evaluation is not the only possible way to capture specifications of calculations for future interpretation even in R
. For example, formula-like interfaces also provide this capability.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.