How environments work in R and what is lazy evaluation
Knowledge of the way how R evaluates expressions is crucial to avoid hours of staring at the screen or hitting unexpected and difficult bugs.
We’ll start with an example of an issue I came accross a few months ago when using the purrr::map
function. To simplify, the issue I had:
wat
Since I came across the issue, purrr::map
has changed and this example no longer applies. To simulate it, let’s use a simplified implementation of map
function. You should be able to just copy-paste the code in this article and run it:
How to fix that?
If you don’t already know to fix that issue, you’ll quickly find out. This is quite a common problem and the solution is to use the force
function as follows:
It works! But … why?
This could be a great moment to just carry on - the problem is solved. You’ve heard about lazy evaluation and know that force()
is useful in fixing such issues. But then again, what does lazy evaluation mean in this context?
Let’s take a look at the magical force
function. It consists of two lines:
Huh?
Wait, what’s going on here? Does this mean that I can simply call index
instead of force(index)
and it will still work?
Let’s get to the bottom of this
There are two factors that cause the issue we are facing. The first one is lazy evaluation. The second is the way environments work in R.
Lazy evaluation
The way R works is that it doesn’t evaluate an expression when it is not used. Let’s take a look at an example that you can find in Hadley’s book http://adv-r.had.co.nz/Functions.html:
Another useful example to better understand that expressions are evaluated at the moment they are used:
Please note that promises mentioned here are something different than promises package used to handle concurrent computations. These semantics are described in R language definition R language definition:
The mechanism is implemented via promises. When a function is being evaluated the actual expression used as an argument is stored in the promise together with a pointer to the environment the function was called from. When (if) the argument is evaluated the stored expression is evaluated in the environment that the function was called from. Since only a pointer to the environment is used any changes made to that environment will be in effect during this evaluation. The resulting value is then also stored in a separate spot in the promise. Subsequent evaluations retrieve this stored value (a second evaluation is not carried out).
How environments work
Every function object has an environment assigned when it is created. Let’s call it environment A. When the function is invoked, a new environment is created and used in the function call. This new environment inherits from environment A.
This is what the environments hierarchy is at this point:
How does our example work without force
Environment 0x3fa2db2 inherits from mpfEnv and points to index
variable which is stored in 0x3fa2db0. index
variable is not going to be copied to environment 0x3fa2db2 until it is used there.
How does our example work with force
You shouldn’t come across this issue while using most high-order functions:
R 3.2.0 (2015) changelog:
- Higher-order functions such as the apply functions and Reduce() now force arguments to the functions they apply in order to eliminate undesirable interactions between lazy evaluation and variable capture in closures. This resolves PR#16093.
Purrr issue fixed in March 2017: https://github.com/tidyverse/purrr/issues/191
I hope this knowledge will save you some time if you stumble upon such issues in the future.
Until next time!