Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I keep saying that the more programming languages you know, the more you will understand all the others you know – I’m now at the point where I want to solve every problem I see in a handful of different languages. They all offer different functionality, and some are certainly more suited to particular problems than others, but there’s a world of difference between two characters and importing from two libraries.
A newsletter I follow (and can’t find online copies of) that demonstrates neat
things in python (gotta learn it, despite not loving it) recently covered
accumulate
, showing that sum
and accumulate
were sort of related
>>> my_list = [42, 73, 0, 16, 10] >>> sum(my_list) 141 >>> from itertools import accumulate >>> list(accumulate(my_list)) [42, 115, 115, 131, 141]
sum
adds up all the elements of the list, while accumulate
does the same but
keeps each successive partial sum.
It rounds out the demo with an alternative function being used in accumulate
>>> from itertools import accumulate >>> from operator import mul # mul(2, 3) == 6 >>> initial_investment = 1000 >>> rates = [1.01, 1.01, 1.02, 1.025, 1.035, 1.035, 1.06] >>> list( ... accumulate(rates, mul, initial=initial_investment) ... ) [1000, 1010.0, 1020.1, 1040.502, 1066.515, 1103.843, 1142.477, 1211.026]
Now, firstly… from operator import mul
??? It looks like there’s no way to
pass *
as an argument to a function. I could define a function that performs
the same on known arguments, e.g. lambda x, y: x * y
>>> list(accumulate(rates, lambda x, y: x*y, initial=initial_investment)) [1000, 1010.0, 1020.1, 1040.502, 1066.5145499999999, 1103.8425592499998, 1142.4770488237498, 1211.0256717531747]
but… ew.
It’s possible that there’s a different way to approach this. A list comprehension comes to mind, e.g. something like
>>> [sum(my_list[0:i]) for i in range(1, len(my_list)+1)] [42, 115, 115, 131, 141]
but that requires performing a sum for each sub-interval, so performance would
not scale well (admittedly, that was not a consideration here at all). I also
don’t believe there’s a built-in prod
so one must import math
in order to do
similar
>>> import math >>> x = [initial_investment] + rates >>> [math.prod(x[0:i]) for i in range(1, len(x)+1)] [1000, 1010.0, 1020.1, 1040.502, 1066.5145499999999, 1103.8425592499998, 1142.4770488237498, 1211.0256717531747]
In R that could use the built-in cumprod
for the cumulative product
initial_investment <- 1000 rates = c(1.01, 1.01, 1.02, 1.025, 1.035, 1.035, 1.06) cumprod(c(initial_investment, rates)) ## [1] 1000.000 1010.000 1020.100 1040.502 1066.515 1103.843 1142.477 1211.026
but that has the ‘multiply’ operation hardcoded. cumsum
uses +
as the
function… hmmm. Maybe R doesn’t have a generalised accumulate
?
I’ve been playing around with Haskell lately, so recursive functions to the
rescue! One feature of recursive functions in R that I really like is Recall
which calls the function in which it is defined with a new set of arguments –
perfect for recursion!
accumulate_recall <- function(x, f, i=x[1]) { if (!length(x)) return(NULL) c(i, Recall(tail(x, -1), f, f(i, x[2]))) }
It’s also robust against renaming the function; the body doesn’t actually call
accumulate_recall
by name at all.
This might be inefficient, though – it’s not uncommon to blow out the stack, so
a new Tailcall
function (which doesn’t have the same elegance of being robust
against renaming) helps with flagging this as something that can be optimised
accumulate <- function(x, f, i=x[1]) { if (!length(x)) return(NULL) c(i, Tailcall(accumulate, tail(x, -1), f, f(i, x[2]))) }
With this, I can emulate the cumsum()
and cumprod()
functions
cumprod(1:6) ## [1] 1 2 6 24 120 720 accumulate(1:6, `*`) ## [1] 1 2 6 24 120 720 cumsum(2:6) ## [1] 2 5 9 14 20 accumulate(2:6, `+`) ## [1] 2 5 9 14 20
unless I try to calculate something too big…
cumprod(5:15) ## [1] 5 30 210 1680 15120 151200 ## [7] 1663200 19958400 259459200 3632428800 54486432000 accumulate(5:15, `*`) ## Warning in f(i, x[2]): NAs produced by integer overflow ## [1] 5 30 210 1680 15120 151200 1663200 ## [8] 19958400 259459200 NA NA
It appears that the built-in functions convert to numeric. That’s easily fixed on input
accumulate(as.numeric(5:15), `*`) ## [1] 5 30 210 1680 15120 151200 ## [7] 1663200 19958400 259459200 3632428800 54486432000
In any case, there’s a generalised accumulate
that takes the bare functions as
arguments.
But it can be so much cleaner than this!
In APL you won’t find any function named “sum” because it is just a reduction
(Reduce
in R) with the function +
sum←+/ sum ⍳6 ⍝ sum the values 1:6 21 sum 1↓⍳6 ⍝ sum the values 2:6 20
which in R is
sum(1:6) ## [1] 21 sum(2:6) ## [1] 20
Why would you write sum
if you can just use +/
? It’s fewer
characters to write out the implementation than the name!
For accumulate
the terminology in APL is scan
which uses a very similar
glyph because the operation itself is very similar; a reduce
(/
) is just the
last value of a scan
(\
) which keeps the progressive values. In both cases,
the operator (either slash) takes a binary function as the left argument and
produces a modified function – in these examples, effectively sum
and prod
–
which is then applied to values on the right. The scan
version does the same
+\⍳6 1 3 6 10 15 21 ×\⍳6 1 2 6 24 120 720 accumulate(1:6, `+`) ## [1] 1 3 6 10 15 21 accumulate(1:6, `*`) ## [1] 1 2 6 24 120 720
As for the rates example above, we concatenate the initial value with catenate
(,
) just like the R example, but otherwise this works fine
rates ← 1.01 1.01 1.02 1.025 1.035 1.035 1.06 inv ← 1000 ×/inv, rates 1211.025672 ×\inv, rates 1000 1010 1020.1 1040.502 1066.51455 1103.842559 1142.477049 1211.025672
So all of that recursive R code made to generalise the cumulative application of
a function provided as an argument is boiled down to just the single glyph \
.
Outstanding!
What’s more, there are lots of binary functions one would use this with, all of which have spelled-out names in other languages
+/ ⍝ sum (add) ×/ ⍝ prod (multiply) ∧/ ⍝ all (and) ∨/ ⍝ any (or) ⌈/ ⍝ maximum (max) ⌊/ ⍝ minimum (min)
In summary, it seems that looking across these languages, the available options
range from a single glyph for scan
along with the bare binary operator, e.g.
×/
; a cumprod()
function which isn’t well-generalised but works out of the
box; and then there’s whatever mess this is (once you’ve installed these)
>>> from itertools import accumulate >>> from operator import mul >>> list(accumulate(rates, mul, initial=initial_investment))
Where did we go so wrong?
For what it’s worth, Julia has a reduce
and an accumulate
that behave very
nicely; generalised for the binary function as an argument
julia> reduce(+, 1:6) 21 julia> reduce(*, 1:6) 720 julia> accumulate(+, 1:6) 6-element Vector{Int64}: 1 3 6 10 15 21 julia> accumulate(*, 1:6) 6-element Vector{Int64}: 1 2 6 24 120 720
This is extremely close to the APL approach, but with longer worded names for
the reduce
and scan
operators. It also defines the more convenient sum
,
prod
, cumsum
, and cumprod
; no shortage of ways to do this in Julia!
In Haskell, foldl
and scanl
are the (left-associative) version of reduce
and accumulate
, and passing an infix as an argument necessitates wrapping it
in parentheses
ghci> foldl (+) 0 [1..6] 21 ghci> scanl (+) 0 [1..6] [0,1,3,6,10,15,21] ghci> foldl (*) 1 [1..6] 720 ghci> scanl (*) 1 [1..6] [1,1,2,6,24,120,720]
This requires an explicit starting value, unless one uses the specialised versions which use the first value as an initial value
ghci> foldl1 (+) [1..6] 21 ghci> scanl1 (+) [1..6] [1,3,6,10,15,21] ghci> foldl1 (*) [1..6] 720 ghci> scanl1 (*) [1..6] [1,2,6,24,120,720]
I started this post hoping to demonstrate how nice the APL syntax was for this, but the detour through generalising the R function was a lot of unexpected fun as well.
Comments, improvements, or your own solutions are most welcome. I can be found on Mastodon or use the comments below.
Addendums
It should probably be noted that R does have a function scan
but it’s for
reading data into a vector – if you ever spot someone using it for that… run.
I have war stories about that function.
I’d love to hear how this is accomplished in some other languages, too – does it
have a built-in accumulate
that takes a binary function?
< details> < summary> devtools::session_info()
## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.4.1 (2024-06-14) ## os macOS Sonoma 14.6 ## system aarch64, darwin20 ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Australia/Adelaide ## date 2024-11-28 ## pandoc 3.5 @ /opt/homebrew/bin/ (via rmarkdown) ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date (UTC) lib source ## blogdown 1.19 2024-02-01 [1] CRAN (R 4.4.0) ## bookdown 0.41 2024-10-16 [1] CRAN (R 4.4.1) ## bslib 0.8.0 2024-07-29 [1] CRAN (R 4.4.0) ## cachem 1.1.0 2024-05-16 [1] CRAN (R 4.4.0) ## cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.0) ## devtools 2.4.5 2022-10-11 [1] CRAN (R 4.4.0) ## digest 0.6.37 2024-08-19 [1] CRAN (R 4.4.1) ## ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.4.0) ## evaluate 1.0.1 2024-10-10 [1] CRAN (R 4.4.1) ## fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0) ## fs 1.6.5 2024-10-30 [1] CRAN (R 4.4.1) ## glue 1.8.0 2024-09-30 [1] CRAN (R 4.4.1) ## htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0) ## htmlwidgets 1.6.4 2023-12-06 [1] CRAN (R 4.4.0) ## httpuv 1.6.15 2024-03-26 [1] CRAN (R 4.4.0) ## jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.4.0) ## jsonlite 1.8.9 2024-09-20 [1] CRAN (R 4.4.1) ## knitr 1.48 2024-07-07 [1] CRAN (R 4.4.0) ## later 1.3.2 2023-12-06 [1] CRAN (R 4.4.0) ## lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0) ## magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0) ## memoise 2.0.1 2021-11-26 [1] CRAN (R 4.4.0) ## mime 0.12 2021-09-28 [1] CRAN (R 4.4.0) ## miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.4.0) ## pkgbuild 1.4.5 2024-10-28 [1] CRAN (R 4.4.1) ## pkgload 1.4.0 2024-06-28 [1] CRAN (R 4.4.0) ## profvis 0.4.0 2024-09-20 [1] CRAN (R 4.4.1) ## promises 1.3.0 2024-04-05 [1] CRAN (R 4.4.0) ## purrr 1.0.2 2023-08-10 [1] CRAN (R 4.4.0) ## R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0) ## Rcpp 1.0.13-1 2024-11-02 [1] CRAN (R 4.4.1) ## remotes 2.5.0.9000 2024-11-03 [1] Github (r-lib/remotes@5b7eb08) ## rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.0) ## rmarkdown 2.28 2024-08-17 [1] CRAN (R 4.4.0) ## rstudioapi 0.17.1 2024-10-22 [1] CRAN (R 4.4.1) ## sass 0.4.9 2024-03-15 [1] CRAN (R 4.4.0) ## sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0) ## shiny 1.9.1 2024-08-01 [1] CRAN (R 4.4.0) ## urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.4.0) ## usethis 3.0.0 2024-07-29 [1] CRAN (R 4.4.0) ## vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0) ## xfun 0.49 2024-10-31 [1] CRAN (R 4.4.1) ## xtable 1.8-4 2019-04-21 [1] CRAN (R 4.4.0) ## yaml 2.3.10 2024-07-26 [1] CRAN (R 4.4.0) ## ## [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library ## ## ──────────────────────────────────────────────────────────────────────────────
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.