In-Place Modifications

[This article was first published on rstats on Irregularly Scheduled Programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post I explore some differences between R, python, julia, and APL in terms of mutability, and try to make something that probably shouldn’t exist.

I watched this code_report video which describes a leetcode problem;

You are given an integer array nums, an integer k, and an integer multiplier.

You need to perform k operations on nums. In each operation:

  • Find the minimum value x in nums. If there are multiple occurrences of the minimum value, select the one that appears first.
  • Replace the selected minimum value x with x * multiplier.

Return an integer array denoting the final state of nums after performing all k operations.

Conor’s python solution in the video was

def getFinalState(nums, k, m): 
  for _ in range(k): 
    i = nums.index(min(nums)) 
    nums[i] *= m
  return nums

x = [2, 1, 3, 5, 6]
k = 5
mult = 2

getFinalState(x, k, mult)
## [8, 4, 6, 5, 6]

and, as always, I wanted to see how I’d do that in R. I came up with this

getFinalState = function(nums, k, m) {
  for (i in 1:k) {
    m <- which.min(nums)[1]
    nums[m] <- mult * nums[m]
  }
  nums
}

x <- c(2, 1, 3, 5, 6)
k <- 5
mult <- 2

getFinalState(x, k, mult)
## [1] 8 4 6 5 6

It’s worth noting that I can’t use a map in this function because iterations are dependent; the minimum value at any iteration depends on the previous values.

I also had a chance to discuss this solution with some APL’ers at a meetup and a J solution was presented, but I don’t think I wrote it down.

My solution is nearly word-for-word the same as the python solution with a couple of notable exceptions arising from the difference between the two languages:

First, R has which.min() as a built-in rather than needing to query the index of the minimum value (and two references to nums). Also, R has no compound assignment like x *= 2 which modifies in-place - the closest thing I can think of is the %<>% operator in {magrittr} (not re-exported in {dplyr} because this behaviour is considered bad practice in R, despite not really being “in-place”)

library(magrittr)

m <- data.frame(x = 1:6, y = letters[1:6])
m
##   x y
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
## 6 6 f
m %<>% head(2)
m
##   x y
## 1 1 a
## 2 2 b

although I can certainly see the case for it - this operator avoids repeating the variable being used and assigned, because the alternative using the traditional pipe is

m <- data.frame(x = 1:6, y = letters[1:6])
m
##   x y
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 d
## 5 5 e
## 6 6 f
m <- m %>% head(2)
m
##   x y
## 1 1 a
## 2 2 b

One could argue that writing out even a longer variable name twice still makes it clear that shadowing is taking place; the value is being overwritten with a new value, but it does feel a little frustrating to have to type it out twice

important_variable <- important_variable * 2

Back to my R solution, the indexing at a specific set of values got me thinking that it would be clean if we could pass a function to [ so that we could write

nums[which.min] <- value

(maybe not so much for this example where m is used twice, but it piqued my interest)

Let’s say I want to set all the even values of a vector to some other value. That’s easy enough to do

x[x %% 2 == 0] <- 0

but I don’t love that it requires two references to x, which may (should?) be a much longer name

important_variable[important_variable %% 2 == 0] <- 0

I want something like x[f] <- y to set the values of x where f(x) is TRUE to y. This seemed like it might be possible, maybe with a function method to [<-, but [<- dispatches on the class of x, not what’s inside [, so no dice. In theory (which will never happen) the built-in [<- could have some branch logic for dealing with a function passed as the indices to be modified, but I’m not about to go rebuilding R from source myself just to play with that idea.

Nonetheless, if I define some functions that do accomplish this

is_even <- function(z) z %% 2 == 0

set_if <- function(x, f, value) {
  x[f(x)] <- value
  x
}

then I can try this out on a vector

a <- 1:10
a
##  [1]  1  2  3  4  5  6  7  8  9 10
set_if(a, is_even, 0)
##  [1] 1 0 3 0 5 0 7 0 9 0
a # unchanged
##  [1]  1  2  3  4  5  6  7  8  9 10

It works, but I’m back to having to write a <- do_stuff(a) because a isn’t actually modified by this function.

Ideally, my function would operate the same as this does

a <- 1:10
a[is_even(a)] <- 0
a
##  [1] 1 0 3 0 5 0 7 0 9 0

which does modify a in-place; R is not entirely pure, and does occasionally allow what looks like direct mutation, though under the hood, it’s not - a new object is actually created

# not using a range e.g. 1:n because that's internally 
# a "compact" representation
a <- c(2, 3, 4)
.Internal(inspect(a))
## @63a4d9b05be8 14 REALSXP g0c3 [REF(2)] (len=3, tl=0) 2,3,4
a[2] <- 9
.Internal(inspect(a))
## @63a4d9b0fbf8 14 REALSXP g0c3 [REF(1)] (len=3, tl=0) 2,9,4

Note that the memory address has changed.

If I was working with a language which did support (enable?) modify-in-place then that might look like

def is_even(x):
   return x % 2 == 0

def set_if(x, f, value):
     for i in range(len(x)):
         if f(x[i]):
             x[i] = value

a = list(range(10))
a
## [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
set_if(a, is_even, 0)
a
## [0, 1, 0, 3, 0, 5, 0, 7, 0, 9]

Now, that’s not always a great thing. In such a language with mutable structures (e.g. lists) we can do maddening things like this

x = [3, 4, 5]
y = x
y is x
## True
y[1] = 9
x # still 'bound' to y
## [3, 9, 5]

Here, is means “are these two things identical in the sense of referring to the same block of memory”, noting that literals (e.g. single numbers) are referenced that way, but tuples aren’t

abc = (11, 99)
xyz = (11, 99)
abc is xyz
## False
abc == xyz
## True

The big question is can I hack together some solution that does work in-place in R? Yeah, with some ill-advised calls

set_if <- function(x, f, value) {
  # can't use <<- because the value passed in as the x argument isn't 
  # necessarily named 'x' in the parent scope
  .x <- x
  .x[f(.x)] <- value
  e <- parent.env(environment())
  assign(deparse(substitute(x)), .x, pos = e)
  invisible(.x)
}


a <- 1:10
a
##  [1]  1  2  3  4  5  6  7  8  9 10
set_if(a, is_even, 0)
a
##  [1] 1 0 3 0 5 0 7 0 9 0

As I note in the comment there, I can’t use the super-assignment arrow <<- inside this function because I don’t know the name of the variable I’m updating; it needs to be deparsed from the incoming argument.

This means that it works regardless of the name of the variable being modified

b <- 10:20
b
##  [1] 10 11 12 13 14 15 16 17 18 19 20
set_if(b, is_even, 0)
b
##  [1]  0 11  0 13  0 15  0 17  0 19  0

I tried to think of some other languages which might support this sort of in-place set_if(x, f, value) modification and (Dyalog) APL was worth a thought.

    ⍝ create a vector from 1 to 10
    x←⍳10
    x
1 2 3 4 5 6 7 8 9 10

    ⍝ the function {0=2|⍵} calculates a boolean vector with 
    ⍝ 1 where the value is even
    {0=2|⍵} x
0 1 0 1 0 1 0 1 0 1

    ⍝ the `@` operator takes a value (or function) on the left and 
    ⍝ a function (or boolean values) on the right and applies it to the 
    ⍝ other argument on the right
    0@{0=2|⍵} x 
1 0 3 0 5 0 7 0 9 0

    ⍝ alternatively a point-free function defined as the negation (`~`) of a 
    ⍝ binding (`∘`) of the value 2 to modulo (`|`); the negation is needed
    ⍝ otherwise this returns the result of the modulo, not where it is 0
    0@(~2∘|)⍳10
1 0 3 0 5 0 7 0 9 0

    ⍝ x is, however, unchanged as APL is typically immutable
    x
1 2 3 4 5 6 7 8 9 10

So there’s no way to do the in-place modification. it is nice, though, that 0@(~2∘|)x only refers to x once.

Julia makes a nice distinction between functions which mutate arguments and those which don’t; (by convention) the former are named ending with an exclamation mark, e.g.

vec = collect(1:5)
## 5-element Vector{Int64}:
##  1
##  2
##  3
##  4
##  5
# non-mutating
reverse(vec)
## 5-element Vector{Int64}:
##  5
##  4
##  3
##  2
##  1
vec
## 5-element Vector{Int64}:
##  1
##  2
##  3
##  4
##  5
# mutating
reverse!(vec)
## 5-element Vector{Int64}:
##  5
##  4
##  3
##  2
##  1
vec
## 5-element Vector{Int64}:
##  5
##  4
##  3
##  2
##  1

In julia, the iseven() function is already built-in, but vectorisation is explicit via a broadcast operator . and the setting of even values to 0 looks like

x = collect(1:10);
x[iseven.(x)] .= 0;
x
## 10-element Vector{Int64}:
##  1
##  0
##  3
##  0
##  5
##  0
##  7
##  0
##  9
##  0

which looks very much like the R version with some dots where scalar functions are vectorised. If I don’t use the last . to perform vectorised assignment, the error tells me that the failure involved the setindex! function which does sound like what I want, but this doesn’t work

setindex!(x, 0, iseven.(x))

because it’s trying to assign the value 0 multiple times and I only provided one of them. Instead,

x = collect(1:10);
setindex!(x, zeros(Int8, 5), iseven.(x));
x
## 10-element Vector{Int64}:
##  1
##  0
##  3
##  0
##  5
##  0
##  7
##  0
##  9
##  0

does work, but I had to manually count how many 0 entries this requires, so the [ approach seems cleaner. Either way, I’ve had to explicitly calculate iseven(x) and pass that result somewhere.

Since Julia allows users to extend methods, I could do that modification myself!

import Base.setindex! 
  
function setindex!(A::Vector{Int64}, v::Int64, f::Function) 
  A[f.(A)] .= v
end
## setindex! (generic function with 240 methods)
x = collect(1:10);
setindex!(x, 0, iseven);
x
## 10-element Vector{Int64}:
##  1
##  0
##  3
##  0
##  5
##  0
##  7
##  0
##  9
##  0

which I could just as easily call set_if!

set_if! = setindex!;
x = collect(1:10);
set_if!(x, 0, iseven);
x
## 10-element Vector{Int64}:
##  1
##  0
##  3
##  0
##  5
##  0
##  7
##  0
##  9
##  0

Nice! I do wonder if I can “hack” (ahem, extend) Julia’s [ to get my prized x[f] = 0 solution but I doubt it’s worth it when the above does the right thing.

I don’t imagine I’ll package up my set_if() anywhere, and I should probably even avoid using it myself, but it’s been an interesting journey thinking about this stuff. Maybe there’s a better way to do it? Maybe there’s a language which better supports something like that? If you know, or you have comments or suggestions, I can be found on Mastodon or use the comment section below.


devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.3.3 (2024-02-29)
##  os       Pop!_OS 22.04 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  en_AU.UTF-8
##  ctype    en_AU.UTF-8
##  tz       Australia/Adelaide
##  date     2024-09-25
##  pandoc   3.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/x86_64/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date (UTC) lib source
##  blogdown      1.19    2024-02-01 [1] CRAN (R 4.3.3)
##  bookdown      0.36    2023-10-16 [1] CRAN (R 4.3.2)
##  bslib         0.8.0   2024-07-29 [1] CRAN (R 4.3.3)
##  cachem        1.1.0   2024-05-16 [1] CRAN (R 4.3.3)
##  callr         3.7.3   2022-11-02 [3] CRAN (R 4.2.2)
##  cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.3)
##  crayon        1.5.2   2022-09-29 [3] CRAN (R 4.2.1)
##  devtools      2.4.5   2022-10-11 [1] CRAN (R 4.3.2)
##  digest        0.6.37  2024-08-19 [1] CRAN (R 4.3.3)
##  ellipsis      0.3.2   2021-04-29 [3] CRAN (R 4.1.1)
##  evaluate      0.24.0  2024-06-10 [1] CRAN (R 4.3.3)
##  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.3.3)
##  fs            1.6.4   2024-04-25 [1] CRAN (R 4.3.3)
##  glue          1.7.0   2024-01-09 [1] CRAN (R 4.3.3)
##  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.3.3)
##  htmlwidgets   1.6.2   2023-03-17 [1] CRAN (R 4.3.2)
##  httpuv        1.6.12  2023-10-23 [1] CRAN (R 4.3.2)
##  icecream      0.2.1   2023-09-27 [1] CRAN (R 4.3.2)
##  jquerylib     0.1.4   2021-04-26 [1] CRAN (R 4.3.3)
##  jsonlite      1.8.8   2023-12-04 [1] CRAN (R 4.3.3)
##  JuliaCall     0.17.5  2022-09-08 [1] CRAN (R 4.3.3)
##  knitr         1.48    2024-07-07 [1] CRAN (R 4.3.3)
##  later         1.3.1   2023-05-02 [1] CRAN (R 4.3.2)
##  lattice       0.22-5  2023-10-24 [4] CRAN (R 4.3.1)
##  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.3.3)
##  magrittr    * 2.0.3   2022-03-30 [1] CRAN (R 4.3.3)
##  Matrix        1.6-5   2024-01-11 [4] CRAN (R 4.3.3)
##  memoise       2.0.1   2021-11-26 [1] CRAN (R 4.3.3)
##  mime          0.12    2021-09-28 [1] CRAN (R 4.3.3)
##  miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.3.2)
##  pkgbuild      1.4.2   2023-06-26 [1] CRAN (R 4.3.2)
##  pkgload       1.3.3   2023-09-22 [1] CRAN (R 4.3.2)
##  png           0.1-8   2022-11-29 [1] CRAN (R 4.3.2)
##  prettyunits   1.2.0   2023-09-24 [3] CRAN (R 4.3.1)
##  processx      3.8.3   2023-12-10 [3] CRAN (R 4.3.2)
##  profvis       0.3.8   2023-05-02 [1] CRAN (R 4.3.2)
##  promises      1.2.1   2023-08-10 [1] CRAN (R 4.3.2)
##  ps            1.7.6   2024-01-18 [3] CRAN (R 4.3.2)
##  purrr         1.0.2   2023-08-10 [3] CRAN (R 4.3.1)
##  R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.3)
##  Rcpp          1.0.11  2023-07-06 [1] CRAN (R 4.3.2)
##  remotes       2.4.2.1 2023-07-18 [1] CRAN (R 4.3.2)
##  reticulate    1.34.0  2023-10-12 [1] CRAN (R 4.3.2)
##  rlang         1.1.4   2024-06-04 [1] CRAN (R 4.3.3)
##  rmarkdown     2.28    2024-08-17 [1] CRAN (R 4.3.3)
##  rstudioapi    0.15.0  2023-07-07 [3] CRAN (R 4.3.1)
##  sass          0.4.9   2024-03-15 [1] CRAN (R 4.3.3)
##  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.2)
##  shiny         1.7.5.1 2023-10-14 [1] CRAN (R 4.3.2)
##  stringi       1.8.4   2024-05-06 [1] CRAN (R 4.3.3)
##  stringr       1.5.1   2023-11-14 [1] CRAN (R 4.3.3)
##  urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.3.2)
##  usethis       3.0.0   2024-07-29 [1] CRAN (R 4.3.3)
##  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.3.3)
##  xfun          0.47    2024-08-17 [1] CRAN (R 4.3.3)
##  xtable        1.8-4   2019-04-21 [1] CRAN (R 4.3.2)
##  yaml          2.3.10  2024-07-26 [1] CRAN (R 4.3.3)
## 
##  [1] /home/jono/R/x86_64-pc-linux-gnu-library/4.3
##  [2] /usr/local/lib/R/site-library
##  [3] /usr/lib/R/site-library
##  [4] /usr/lib/R/library
## 
## ─ Python configuration ───────────────────────────────────────────────────────
##  python:         /home/jono/.virtualenvs/r-reticulate/bin/python
##  libpython:      /usr/lib/python3.10/config-3.10-x86_64-linux-gnu/libpython3.10.so
##  pythonhome:     /home/jono/.virtualenvs/r-reticulate:/home/jono/.virtualenvs/r-reticulate
##  version:        3.10.12 (main, Jul 29 2024, 16:56:48) [GCC 11.4.0]
##  numpy:           [NOT FOUND]
##  
##  NOTE: Python version was forced by VIRTUAL_ENV
## 
## ──────────────────────────────────────────────────────────────────────────────


To leave a comment for the author, please follow the link and comment on their blog: rstats on Irregularly Scheduled Programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)