Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
One thing that has really caught my attention as I learn more programming languages is the idea of generators or infinite sequences of values. Yes, infinite. Coming from R, that seems unlikely, but in at least several other languages, it’s entirely possible thanks to iterators and lazy evaluation.
I saw this video which solves a codewars challenge using an infinite list, which references this one on the same topic.
First, a diversion into recursion
In Haskell, one of the first exercises after “Hello, World!” people discover is the Fibonacci sequence, where the \(n^{\rm th}\) value is given by the sum of the two previous values. As a function, this can be written as
λ> fib 0 = 0 λ> fib 1 = 1 λ> fib n = fib (n-1) + fib (n-2)
Essentially, fib(0)
returns 0
. fib(1)
returns 1
, and for any value n
it
returns the (recursively defined) sum of the two previous values. This isn’t infinite
at all…
\[ \begin{align*} {\rm fib}(4) &= {\rm fib}(3) + {\rm fib}(2)\\ &= ({\rm fib}(2) + {\rm fib}(1)) + ({\rm fib}(1) + {\rm fib}(0))\\ &= {\rm fib}(1) + {\rm fib}(0) + {\rm fib}(1) + {\rm fib}(1) + {\rm fib}(0)\\ &= 1 + 0 + 1 + 1 + 0\\ &= 3 \end{align*} \]
We could write that in R as
fib <- function(n) { if (n == 0) return(0) if (n == 1) return(1) fib(n - 1) + fib(n - 2) } fib(4) ## [1] 3 fib(8) ## [1] 21
This may come as a surprise to some – the function is defined in terms of itself and some base cases. This is entirely fine in R and Haskell as they’re lazily evaluated – nothing happens until a value is actually used. In R, this means that if an argument to a function isn’t used, it’s not evaluated at all
loud_func <- function() { cat("HELLLOOOOO!!!\n") } stays_quiet <- function(f, g = loud_func()) { f(c(1, 2, 3, 4)) } stays_quiet(f = mean) ## [1] 2.5
Notice that although the argument g
is the invocation of loud_func()
, it’s never
evaluated because we don’t use it. If we did use it…
noisy <- function(f, g = loud_func()) { g f(c(1, 2, 3, 4)) } noisy(sum) ## HELLLOOOOO!!! ## [1] 10
So, in these languages, we can define a function recursively. Given the base case(s), these will eventually return just a number, so the computation will complete.
Instead of just adding the values together, we can create a sequence of values by concatenating the iterations together. Starting with data and working down to a base case is called “recursion”, while starting from a base case and building up a data structure is “corecursion”.
If we want a sequence of values that represents the Fibonacci numbers, we can use
fibs <- function(n) { if (n == 0) return(0) if (n == 1) return(1) prev <- fibs(n - 1) c(prev, sum(tail(prev, 2))) } fibs(10) ## [1] 1 1 2 3 5 8 13 21 34 55
What’s blown my mind recently is the concept of infinite data structures. If I defined some function that, instead of working down to a base case, just kept expanding, say, by concatenating with a larger number (corecursion), such as
inf_series <- function(x) { c(x, inf_series(x + 1)) }
Then, if I tried to evaluate inf_series(5)
this would produce
c(5, inf_series(6))
which would expand to
c(5, 6, inf_series(7))
and so on… forever. Of course, R can’t keep going forever
inf_series(5) Error: C stack usage 7971732 is too close to the limit
This error comes about because each function in R is a “closure” which “encloses” its environment. The way R keeps track of that (and where it needs to return after returning from a function) is by adding “stack frames” each time it dives deeper into a function calling a function. We’ve asked it to add infinity of these, so at some point it says “too many”.
Okay, so, not possible, right?
In Haskell I can define
λ> fibs = 0 : scanl (+) 1 fibs
which is a concatenation (:
) of 0
with the result of scanl (+) 1 fibs
. Note carefully,
this isn’t a function – it’s a vector of values defined recursively 🤯
To explain the definition a little more: scanl
is similar to reduce
in that it takes a starting value, a vector, and a
binary operator, but rather than reducing the vector to a value, it creates a new
vector with the successively reduced values. In this example, the values 1..5
are
successively added (+
) to 0
, so the second entry is 0+1=1
, the next is 1+2=3
, the
next is 3+3=6
, then 6+4=10
, then 10+5=15
λ> scanl (+) 0 [1..5] [0,1,3,6,10,15]
In the Fibonacci case, the operator is still addition, but the starting value is 1
and the vector is … the entire vector we’re defining. Writing out some of the
terms makes this easier to understand
λ> scanl (+) 1 [0, 1, 1, 2, 3, 5, 8] [1,1,2,3,5,8,13,21]
The first terms in the sequence, after concatenating with 0
will be
λ> [0, 1, 1, 2, 3, 5, 8]
so, by defining fibs
in terms of fibs
, the sequence can go on forever. So,
what if you try to print this? In GHCI
, the output will just stream forever, which
isn’t particularly useful. Instead, we can ask for some number of values, say, ten
λ> take 10 fibs [0,1,1,2,3,5,8,13,21,34]
Due to the laziness of Haskell, nothing is computed until it’s needed, so asking for any number of values is fast, despite the list being “infinite”
λ> take 100 fibs [0,1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765,10946, 17711,28657,46368,75025,121393,196418,317811,514229,832040,1346269,2178309, 3524578,5702887,9227465,14930352,24157817,39088169,63245986,102334155, 165580141,267914296,433494437,701408733,1134903170,1836311903,2971215073, 4807526976,7778742049,12586269025,20365011074,32951280099,53316291173, 86267571272,139583862445,225851433717,365435296162,591286729879,956722026041, 1548008755920,2504730781961,4052739537881,6557470319842,10610209857723, 17167680177565,27777890035288,44945570212853,72723460248141,117669030460994, 190392490709135,308061521170129,498454011879264,806515533049393, 1304969544928657,2111485077978050,3416454622906707,5527939700884757, 8944394323791464,14472334024676221,23416728348467685,37889062373143906, 61305790721611591,99194853094755497,160500643816367088,259695496911122585, 420196140727489673,679891637638612258,1100087778366101931,1779979416004714189, 2880067194370816120,4660046610375530309,7540113804746346429,12200160415121876738, 19740274219868223167,31940434634990099905,51680708854858323072, 83621143489848422977,135301852344706746049,218922995834555169026]
The idea of taking some values from an iterator shows up in other languages.
In Rust, I can create an infinite iterator of the value 1
use std::iter; let ones = iter::repeat(1);
and I can take some number of these, say, five, collected into a vector
ones.take(5).collect::<Vec<_>>() [1, 1, 1, 1, 1]
Python also has a notion of infinite sequences, and they’re likely even more common.
When you work with a (regular, finite) list of values in a range, you get a range
object
numbers = range(1, 9) numbers ## range(1, 9) type(numbers) ## <class 'range'>
You can convert this to a regular list with
list(numbers) ## [1, 2, 3, 4, 5, 6, 7, 8] type(list(numbers)) ## <class 'list'>
If you filter
the numbers (which works on a range
or a list
) you get a filter
object
even_numbers = filter(lambda x: x % 2 == 0, numbers) even_numbers ## <filter object at 0x7f5ac61da470> type(even_numbers) ## <class 'filter'>
which you can also convert to a list to see the values
list(even_numbers) ## [2, 4, 6, 8]
But, you can use this as an iterable, so you can get the ‘next’ value as many times as you need (defined here again to restart the iterator)
even_numbers = filter(lambda x: x % 2 == 0, numbers) next(even_numbers) ## 2 next(even_numbers) ## 4 next(even_numbers) ## 6 next(even_numbers) ## 8
until there’s none left
next(even_numbers) ## Error in py_call_impl(callable, dots$args, dots$keywords): StopIteration
As before, you can convert these to a fixed-length list, if desired
list(filter(lambda x: x % 2 == 0, numbers)) ## [2, 4, 6, 8]
Still with me? Great. We can create an infinite list, if we want to, because it isn’t evaluated until we ask for elements
def infinitenumbers(): count = 0 while True: yield count count += 1 nums = infinitenumbers() nums ## <generator object infinitenumbers at 0x7f5ac615d310> type(nums) ## <class 'generator'>
This is a generator which means it’s capable of generating values. We can ask for as many as we want, now
next(nums) ## 0 next(nums) ## 1 next(nums) ## 2 next(nums) ## 3 next(nums) ## 4
If we want to convert some number of these to a list, we need a new function, roughly
the equivalent of Haskell’s take
, in order to extract these
from itertools import islice nums = infinitenumbers() list(islice(nums, 10)) ## [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
and as before, if we ask for more now, we get the next batch
list(islice(nums, 10)) ## [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
So, can we do this at all in R? We can’t build an infinite data structure per se, but what if we use a recursive definition and just… stop when it’s recursed enough times?
I initially thought about hacking into functions with body()
to keep track of this,
but we’ve already discussed something we can use – the stack frames! R keeps track of
how deep it’s recursed using these, and we can access that information with sys.calls()
,
the help for which describes
.GlobalEnv is given number 0 in the list of frames. Each subsequent function evaluation increases the frame stack by 1.
So, we can tell from inside a recursive function how deeply nested we currently are.
I wrote the following helper, named for the Haskell inspiration
take <- function(f, n, x = 1) { current_depth <- length(sys.calls()) # Subtract 1 to exclude the current call, # but add 1 to start at 1 # uncomment this line to watch the magic happen! # cat("Current Stack Depth: ", current_depth, "\n") if (current_depth >= n) { return(f(x)) } else { return(c(f(x), take(f, n, x + 1))) } }
this checks length(sys.calls())
which starts at 1 the first time it’s called, and adds
1 every time we go deeper. So long as we haven’t reached the requested depth, it
combines the passed-in function evaluated at \(x + i\) with a new evaluation one level deeper.
When that reaches the requested depth, it returns the evaluated function, bubbling up the returned values so that we end up with a vector of \(n\) values
\[ [f(x), f(x+1), f(x+2), f(x+3), \dots, f(x+n-1)] \]
Neat idea, but does it work? We can’t pass in an actual infinite data structure, but we can pass a function that defines one
A (trivial) function that produces a number at each value of x
is
numbers <- function(x) { x }
If we pretend that’s a generator for every number, we can “take” some values from it
take(numbers, 5) [1] 1 2 3 4 5 take(numbers, 10) [1] 1 2 3 4 5 6 7 8 9 10
A more complicated recipe for an infinite list of numbers could be
g <- function(x) { x + 1 } take(g, 7) [1] 2 3 4 5 6 7 8 take(g, 10) [1] 2 3 4 5 6 7 8 9 10 11
Dealing with non-sequential numbers might be trickier… what if we want all the even numbers?
evens <- function(x) { if(x %% 2 == 0) x else NULL } take(evens, 10) [1] 2 4 6 8 10
No, that only checks the first 10 numbers. Instead,
evens <- function(x) { x * 2 } take(evens, 7) [1] 2 4 6 8 10 12 14 take(evens, 10) [1] 2 4 6 8 10 12 14 16 18 20
What about our original example?
fibs <- function(x) { if (x == 0) return(0) if (x == 1) return(1) fibs(x - 1) + fibs(x - 2) } # 10 values, starting at 0 take(fibs, 10, x = 0) [1] 0 1 1 2 3 5 8 13 21 34 # 10 values, starting at 1 take(fibs, 10, x = 1) [1] 1 1 2 3 5 8 13 21 34 55
Or, if we want 12 values, starting at the 10th
take(fibs, 12, x = 10) [1] 55 89 144 233 377 610 987 1597 2584 4181 6765 10946
I’d say that’s working quite nicely!!!
Some caveats to keep in mind, though…
Since we’re relying on a count of stack frames on top of .GlobalEnv
, this take()
implementation won’t work nicely inside another function. In fact, since
{knitr} is already a few functions deep, it also doesn’t work in an Rmd file (including
this blog which is Rmd via {blogdown}). Not for use in production, but a fun
exercise to figure it out at all.
Is there a better way to achieve this take()
functionality? Where do you use
infinite iterators/generators in R or another language? Spot an improvement that
I can make? I can be found on Mastodon or
use the comments below.
< details> < summary> devtools::session_info()
## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.1.2 (2021-11-01) ## os Pop!_OS 22.04 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_AU.UTF-8 ## ctype en_AU.UTF-8 ## tz Australia/Adelaide ## date 2023-08-18 ## pandoc 3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown) ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date (UTC) lib source ## blogdown 1.17 2023-05-16 [1] CRAN (R 4.1.2) ## bookdown 0.29 2022-09-12 [1] CRAN (R 4.1.2) ## bslib 0.5.0 2023-06-09 [3] CRAN (R 4.3.1) ## cachem 1.0.8 2023-05-01 [3] CRAN (R 4.3.0) ## callr 3.7.3 2022-11-02 [3] CRAN (R 4.2.2) ## cli 3.6.1 2023-03-23 [3] CRAN (R 4.2.3) ## crayon 1.5.2 2022-09-29 [3] CRAN (R 4.2.1) ## devtools 2.4.5 2022-10-11 [1] CRAN (R 4.1.2) ## digest 0.6.33 2023-07-07 [3] CRAN (R 4.3.1) ## ellipsis 0.3.2 2021-04-29 [3] CRAN (R 4.1.1) ## evaluate 0.21 2023-05-05 [3] CRAN (R 4.3.0) ## fastmap 1.1.1 2023-02-24 [3] CRAN (R 4.2.2) ## fs 1.6.3 2023-07-20 [3] CRAN (R 4.3.1) ## glue 1.6.2 2022-02-24 [3] CRAN (R 4.2.0) ## here 1.0.1 2020-12-13 [1] CRAN (R 4.1.2) ## htmltools 0.5.6 2023-08-10 [3] CRAN (R 4.3.1) ## htmlwidgets 1.5.4 2021-09-08 [1] CRAN (R 4.1.2) ## httpuv 1.6.6 2022-09-08 [1] CRAN (R 4.1.2) ## jquerylib 0.1.4 2021-04-26 [3] CRAN (R 4.1.2) ## jsonlite 1.8.7 2023-06-29 [3] CRAN (R 4.3.1) ## knitr 1.43 2023-05-25 [3] CRAN (R 4.3.0) ## later 1.3.0 2021-08-18 [1] CRAN (R 4.1.2) ## lattice 0.21-8 2023-04-05 [4] CRAN (R 4.3.0) ## lifecycle 1.0.3 2022-10-07 [3] CRAN (R 4.2.1) ## magrittr 2.0.3 2022-03-30 [3] CRAN (R 4.2.0) ## Matrix 1.6-0 2023-07-08 [4] CRAN (R 4.3.1) ## memoise 2.0.1 2021-11-26 [3] CRAN (R 4.2.0) ## mime 0.12 2021-09-28 [3] CRAN (R 4.2.0) ## miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.1.2) ## pkgbuild 1.4.0 2022-11-27 [1] CRAN (R 4.1.2) ## pkgload 1.3.0 2022-06-27 [1] CRAN (R 4.1.2) ## png 0.1-7 2013-12-03 [1] CRAN (R 4.1.2) ## prettyunits 1.1.1 2020-01-24 [3] CRAN (R 4.0.1) ## processx 3.8.2 2023-06-30 [3] CRAN (R 4.3.1) ## profvis 0.3.7 2020-11-02 [1] CRAN (R 4.1.2) ## promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.1.2) ## ps 1.7.5 2023-04-18 [3] CRAN (R 4.3.0) ## purrr 1.0.1 2023-01-10 [1] CRAN (R 4.1.2) ## R6 2.5.1 2021-08-19 [3] CRAN (R 4.2.0) ## Rcpp 1.0.9 2022-07-08 [1] CRAN (R 4.1.2) ## remotes 2.4.2 2021-11-30 [1] CRAN (R 4.1.2) ## reticulate 1.26 2022-08-31 [1] CRAN (R 4.1.2) ## rlang 1.1.1 2023-04-28 [1] CRAN (R 4.1.2) ## rmarkdown 2.23 2023-07-01 [3] CRAN (R 4.3.1) ## rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.1.2) ## rstudioapi 0.15.0 2023-07-07 [3] CRAN (R 4.3.1) ## sass 0.4.7 2023-07-15 [3] CRAN (R 4.3.1) ## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2) ## shiny 1.7.2 2022-07-19 [1] CRAN (R 4.1.2) ## stringi 1.7.12 2023-01-11 [3] CRAN (R 4.2.2) ## stringr 1.5.0 2022-12-02 [1] CRAN (R 4.1.2) ## urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.1.2) ## usethis 2.1.6 2022-05-25 [1] CRAN (R 4.1.2) ## vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.1.2) ## xfun 0.40 2023-08-09 [3] CRAN (R 4.3.1) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.1.2) ## yaml 2.3.7 2023-01-23 [3] CRAN (R 4.2.2) ## ## [1] /home/jono/R/x86_64-pc-linux-gnu-library/4.1 ## [2] /usr/local/lib/R/site-library ## [3] /usr/lib/R/site-library ## [4] /usr/lib/R/library ## ## ─ Python configuration ─────────────────────────────────────────────────────── ## python: /usr/bin/python3 ## libpython: /usr/lib/python3.10/config-3.10-x86_64-linux-gnu/libpython3.10.so ## pythonhome: //usr://usr ## version: 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] ## numpy: /home/jono/.local/lib/python3.10/site-packages/numpy ## numpy_version: 1.24.1 ## ## NOTE: Python version was forced by RETICULATE_PYTHON_FALLBACK ## ## ──────────────────────────────────────────────────────────────────────────────
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.