Reflecting on Macros

[This article was first published on rstats on Irregularly Scheduled Programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve been following the drama of the RustConf Keynote Fiasco (RKNF, per @fasterthanlime) from a great distance – I’m not involved in that community beyond starting to learn the language. But the controversial topic itself Compile-Time Reflection seemed like something interesting I could learn something about.

A good start is usually a Wikipedia page, and I found one called “Reflective programming” under the “MetaProgramming” category, where it defines

reflection is the ability of a process to examine, introspect, and modify its own structure and behavior

That sounds somewhat familiar from what metaprogramming I’ve read about. One of the great features of R is the ability to inspect and rewrite functions, for example, the body of the sd() function (calculating the standard deviation of the input) looks like

sd
## function (x, na.rm = FALSE) 
## sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), 
##     na.rm = na.rm))
## <bytecode: 0x5647347e3980>
## <environment: namespace:stats>

Trying to extract a “component” of that function results in the ever-classic error

sd[1]
## Error in sd[1]: object of type 'closure' is not subsettable

However, using body() we can get to the components of the function

body(sd)
## sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x), 
##     na.rm = na.rm))
body(sd)[1]
## sqrt()

and we can even mess with it (meaninglessly, in this case)

vals <- c(1, 3, 5, 7)
sd(vals)
## [1] 2.581989
my_sd <- sd
body(my_sd)[1] <- call("log")
my_sd # note that the function now (wrongly) uses log() instead of sqrt()
## function (x, na.rm = FALSE) 
## log(var(if (is.vector(x) || is.factor(x)) x else as.double(x), 
##     na.rm = na.rm))
## <environment: namespace:stats>
my_sd(vals)
## [1] 1.89712

The Wikipedia page lists the following example of reflection in R

# Without reflection, assuming foo() returns an S3-type object that has method "hello"
obj <- foo()
hello(obj)

# With reflection
class_name <- "foo"
generic_having_foo_method <- "hello"
obj <- do.call(class_name, list())
do.call(generic_having_foo_method, alist(obj))

Using a more concrete data object and class, e.g. tibble::tibble and summary might be clearer

library(tibble) # do.call doesn't like pkg::fun as a string

# Without reflection
obj <- tibble(a = 1:2, b = 3:4)
summary(obj)
##        a              b       
##  Min.   :1.00   Min.   :3.00  
##  1st Qu.:1.25   1st Qu.:3.25  
##  Median :1.50   Median :3.50  
##  Mean   :1.50   Mean   :3.50  
##  3rd Qu.:1.75   3rd Qu.:3.75  
##  Max.   :2.00   Max.   :4.00
# With reflection
class_name <- "tibble"
generic_having_foo_method <- "summary"
obj <- do.call(class_name, list(a = 1:2, b = 3:4))
obj
## # A tibble: 2 × 2
##       a     b
##   <int> <int>
## 1     1     3
## 2     2     4
do.call(generic_having_foo_method, alist(obj))
##        a              b       
##  Min.   :1.00   Min.   :3.00  
##  1st Qu.:1.25   1st Qu.:3.25  
##  Median :1.50   Median :3.50  
##  Mean   :1.50   Mean   :3.50  
##  3rd Qu.:1.75   3rd Qu.:3.75  
##  Max.   :2.00   Max.   :4.00

So, maybe it’s more to do with being able to use a string containing the “name” of a function and go and find that function, or just the ability to generate functions on-demand based on non-function objects (?). Please, let me know if there’s a more enlightening explanation.

I still don’t think I understand that at all (more time required) but I did note in some additional research that “reflection” and “macros” are two very similar concepts. Now macros are something I’ve heard of at least, so I was off to do some more research.

Unfortunately, web searches for the terms “reflection” and “macro” turn up a lot of macro-lens photography results.

I’ve heard of macros in Julia where they’re used to “rewrite” an expression. This is a nice rundown of the process, as are the official docs. These are used in many places. One up-and-coming place is the new Tidier.jl which implements the tidyverse (at least the most common dplyr and purrr parts) using macros (denoted with a @ prefix)

using Tidier
using RDatasets

movies = dataset("ggplot2", "movies");

@chain movies begin
    @mutate(Budget = Budget / 1_000_000)
    @filter(Budget >= mean(skipmissing(Budget)))
    @select(Title, Budget)
    @slice(1:5)
end

Rust uses macros for printing (amongst other things); println!() is a macro, apparently at least in part because it needs to be able to take an arbitrary number of args, so one can write

>> println!("a = {}, b = {}, c = {}", 1, 2, 3)
a = 1, b = 2, c = 3

Rust has a shorthand macro for creating a new vector vec!()

>> let v = vec![2, 3, 4];

and also has the “debug macro” dbg!() which is super handy - it prints out the expression you wrap, plus the value, so you can inspect the current state with e.g.

>> dbg!(&v);
[src/lib.rs:109] &v = [
    2,
    3,
    4,
]

This last one would be great to have in R… as a side note, we could construct a simple version with {rlang}

dbg <- function(x) {
  ex <- rlang::f_text(rlang::enquos(x)[[1]])
  ret <- rlang::eval_bare(x)
  message(glue::glue("DEBUG: {ex} = {ret}"))
  ret
}

a <- 1
b <- 3
x <- dbg(a + b)
## DEBUG: a + b = 4
y <- dbg(2*x + 3)
## DEBUG: 2 * x + 3 = 11
z <- 10 + dbg(y*2)
## DEBUG: y * 2 = 22

In all of these examples of macros, the code that is run is different to the code you write because the macro makes some changes before executing.

In R there isn’t a “proper” way to do this but we do have ways to manipulate code and we do have ways to retrieve “unparsed” input, e.g. substitute(). A quick look for “macros in R” turned up a function in a package that is more than 20 years old (I was only starting University when this came out and knew approximately 0 programming) and comes with a journal article; gtools::defmacro() by Thomas Lumley has a construction for writing something that behaves like a macro.

That article is from 2001 when R 1.3.1 was being released. The example code made me do a double-take

library(gtools)

####
# macro for replacing a specified missing value indicator with NA
# within a dataframe
###
setNA <- defmacro(df, var, values,
  expr = {
    df$var[df$var %in% values] <- NA
  }
)

# create example data using 999 as a missing value indicator
d <- data.frame(
  Grp = c("Trt", "Ctl", "Ctl", "Trt", "Ctl", "Ctl", "Trt", "Ctl", "Trt", "Ctl"),
  V1 = c(1, 2, 3, 4, 5, 6, 999, 8, 9, 10),
  V2 = c(1, 1, 1, 1, 1, 2, 999, 2, 999, 999),
  stringsAsFactors = TRUE
)
d
##    Grp  V1  V2
## 1  Trt   1   1
## 2  Ctl   2   1
## 3  Ctl   3   1
## 4  Trt   4   1
## 5  Ctl   5   1
## 6  Ctl   6   2
## 7  Trt 999 999
## 8  Ctl   8   2
## 9  Trt   9 999
## 10 Ctl  10 999
# Try it out
setNA(d, V1, 999)
setNA(d, V2, 999)
d
##    Grp V1 V2
## 1  Trt  1  1
## 2  Ctl  2  1
## 3  Ctl  3  1
## 4  Trt  4  1
## 5  Ctl  5  1
## 6  Ctl  6  2
## 7  Trt NA NA
## 8  Ctl  8  2
## 9  Trt  9 NA
## 10 Ctl 10 NA

Wait - I thought… there’s no assignment in those last lines, but the data is being modified!?! Sure enough, the internals of defmacro make it clear that this is the case, but it seemed like magic. Essentially, this identifies what needs to happen, what it needs to happen to (via substitute()), and makes it happen in the parent.frame(). Neat! So, what else can we do with this?

I thought about it for a while and realised what could be a [te|ho]rrific one…

Just a couple of weeks ago, Danielle Navarro made a wish

not for the first time I find myself wishing that push() and pop() were S3 generics in #rstats

Now, if you’re not familiar with those, pop(x) removes the first element of a structure x (e.g. a vector) and returns that first value, leaving the original object x containing only the remaining elements, whereas push(x, y) inserts the value y as the first element of x, moving the remaining elements down the line. These show up more in object-oriented languages, but they don’t exist in R.

If we define a vector a containing some values

a <- c(3, 1, 4, 1, 5, 9)

and we wish to extract the first value, we can certainly do so with

a[1]
## [1] 3

but, due to the nature of R, the vector a is unchanged

a
## [1] 3 1 4 1 5 9

Instead, we could remove the first value of a with

a[-1]
## [1] 1 4 1 5 9

but again, a remains unchanged - in order to modify a we must redefine it as e.g.

a <- a[-1]
a
## [1] 1 4 1 5 9

If we wanted to build a pop() function, we could use substitute() to figure out what the passed input object was, perform the extraction of the first element, and so on…

But as we’ve just seen, there’s a better way to define that - a macro!

r_pop <- gtools::defmacro(x, expr = {
  ret <- x[1]
  x <- x[-1]
  ret
})

Now, if we use that on a vector

a <- c(3, 1, 4, 1, 5, 9)
r_pop(a)
## [1] 3
a
## [1] 1 4 1 5 9

It works!!!

Danielle wanted a Generic, though, so we can easily make pop() a Generic and add methods for some classes (which can be further extended).

To that end, I present a brand new package; {weasel}

pop() goes the {weasel}
pop() goes the {weasel}

This defines pop() and push() as Generics with methods defined for vectors, lists, and data.frames

a <- list(x = c(2, 3), y = c("foo", "bar"), z = c(3.1, 4.2, 6.9))
a
## $x
## [1] 2 3
## 
## $y
## [1] "foo" "bar"
## 
## $z
## [1] 3.1 4.2 6.9
x <- pop(a)
a
## $y
## [1] "foo" "bar"
## 
## $z
## [1] 3.1 4.2 6.9
x
## [1] 2 3
a <- data.frame(x = c(2, 3, 4), y = c("foo", "bar", "baz"), z = c(3.1, 4.2, 6.9))
a
##   x   y   z
## 1 2 foo 3.1
## 2 3 bar 4.2
## 3 4 baz 6.9
x <- pop(a)
a
##   x   y   z
## 2 3 bar 4.2
## 3 4 baz 6.9
x
##   x   y   z
## 1 2 foo 3.1
a <- c(1, 4, 1, 5, 9)
a
## [1] 1 4 1 5 9
push(a, 3)
a
## [1] 3 1 4 1 5 9
a <- data.frame(y = c("foo", "bar", "baz"), z = c(3.1, 4.2, 6.9))
a
##     y   z
## 1 foo 3.1
## 2 bar 4.2
## 3 baz 6.9
push(a, data.frame(y = 99, z = 77))
a
##     y    z
## 1  99 77.0
## 2 foo  3.1
## 3 bar  4.2
## 4 baz  6.9

I wrote this (simple) package as a bit of an exercise - I really don’t think you should actually use it for anything. The “looks like it modifies in-place but actually doesn’t” is really non-idiomatic for R. Nonetheless, I was really interested to see that defmacro can be used as a function definition that the dispatch machinery will respect. The only catch I’ve found so far is that I can’t use ellipses (...) in the function signature.

I noticed that Dirk Schumacher built a similar defmacro package more recently, but that appears to be more aimed at building macros to be expanded on package load (funnily enough, “compile-time macros” - we’ve come full circle). This seems like a great opportunity for “inlining” some functions. I’ll definitely be digging deeper into that one.

Let me know if you have a better explanation of any of the concepts I’ve (badly) described here; I’m absolutely just learning and following Julia Evans’ advice about blogging.


devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.1.2 (2021-11-01)
##  os       Pop!_OS 22.04 LTS
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  en_AU.UTF-8
##  ctype    en_AU.UTF-8
##  tz       Australia/Adelaide
##  date     2023-06-17
##  pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package     * version date (UTC) lib source
##  blogdown      1.17    2023-05-16 [1] CRAN (R 4.1.2)
##  bookdown      0.29    2022-09-12 [1] CRAN (R 4.1.2)
##  bslib         0.4.1   2022-11-02 [3] CRAN (R 4.2.2)
##  cachem        1.0.6   2021-08-19 [3] CRAN (R 4.2.0)
##  callr         3.7.3   2022-11-02 [3] CRAN (R 4.2.2)
##  cli           3.4.1   2022-09-23 [3] CRAN (R 4.2.1)
##  crayon        1.5.2   2022-09-29 [3] CRAN (R 4.2.1)
##  devtools      2.4.5   2022-10-11 [1] CRAN (R 4.1.2)
##  digest        0.6.30  2022-10-18 [3] CRAN (R 4.2.1)
##  ellipsis      0.3.2   2021-04-29 [3] CRAN (R 4.1.1)
##  evaluate      0.18    2022-11-07 [3] CRAN (R 4.2.2)
##  fansi         1.0.3   2022-03-24 [3] CRAN (R 4.2.0)
##  fastmap       1.1.0   2021-01-25 [3] CRAN (R 4.2.0)
##  fs            1.5.2   2021-12-08 [3] CRAN (R 4.1.2)
##  glue          1.6.2   2022-02-24 [3] CRAN (R 4.2.0)
##  gtools      * 3.9.4   2022-11-27 [1] CRAN (R 4.1.2)
##  htmltools     0.5.3   2022-07-18 [3] CRAN (R 4.2.1)
##  htmlwidgets   1.5.4   2021-09-08 [1] CRAN (R 4.1.2)
##  httpuv        1.6.6   2022-09-08 [1] CRAN (R 4.1.2)
##  jquerylib     0.1.4   2021-04-26 [3] CRAN (R 4.1.2)
##  jsonlite      1.8.3   2022-10-21 [3] CRAN (R 4.2.1)
##  knitr         1.40    2022-08-24 [3] CRAN (R 4.2.1)
##  later         1.3.0   2021-08-18 [1] CRAN (R 4.1.2)
##  lifecycle     1.0.3   2022-10-07 [3] CRAN (R 4.2.1)
##  magrittr      2.0.3   2022-03-30 [3] CRAN (R 4.2.0)
##  memoise       2.0.1   2021-11-26 [3] CRAN (R 4.2.0)
##  mime          0.12    2021-09-28 [3] CRAN (R 4.2.0)
##  miniUI        0.1.1.1 2018-05-18 [1] CRAN (R 4.1.2)
##  pillar        1.8.1   2022-08-19 [3] CRAN (R 4.2.1)
##  pkgbuild      1.4.0   2022-11-27 [1] CRAN (R 4.1.2)
##  pkgconfig     2.0.3   2019-09-22 [3] CRAN (R 4.0.1)
##  pkgload       1.3.0   2022-06-27 [1] CRAN (R 4.1.2)
##  prettyunits   1.1.1   2020-01-24 [3] CRAN (R 4.0.1)
##  processx      3.8.0   2022-10-26 [3] CRAN (R 4.2.1)
##  profvis       0.3.7   2020-11-02 [1] CRAN (R 4.1.2)
##  promises      1.2.0.1 2021-02-11 [1] CRAN (R 4.1.2)
##  ps            1.7.2   2022-10-26 [3] CRAN (R 4.2.2)
##  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.1.2)
##  R6            2.5.1   2021-08-19 [3] CRAN (R 4.2.0)
##  Rcpp          1.0.9   2022-07-08 [1] CRAN (R 4.1.2)
##  remotes       2.4.2   2021-11-30 [1] CRAN (R 4.1.2)
##  rlang         1.0.6   2022-09-24 [1] CRAN (R 4.1.2)
##  rmarkdown     2.18    2022-11-09 [3] CRAN (R 4.2.2)
##  rstudioapi    0.14    2022-08-22 [3] CRAN (R 4.2.1)
##  sass          0.4.2   2022-07-16 [3] CRAN (R 4.2.1)
##  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.1.2)
##  shiny         1.7.2   2022-07-19 [1] CRAN (R 4.1.2)
##  stringi       1.7.8   2022-07-11 [3] CRAN (R 4.2.1)
##  stringr       1.5.0   2022-12-02 [1] CRAN (R 4.1.2)
##  tibble      * 3.1.8   2022-07-22 [3] CRAN (R 4.2.2)
##  urlchecker    1.0.1   2021-11-30 [1] CRAN (R 4.1.2)
##  usethis       2.1.6   2022-05-25 [1] CRAN (R 4.1.2)
##  utf8          1.2.2   2021-07-24 [3] CRAN (R 4.2.0)
##  vctrs         0.5.2   2023-01-23 [1] CRAN (R 4.1.2)
##  weasel      * 0.1.0   2023-06-09 [1] local
##  xfun          0.34    2022-10-18 [3] CRAN (R 4.2.1)
##  xtable        1.8-4   2019-04-21 [1] CRAN (R 4.1.2)
##  yaml          2.3.6   2022-10-18 [3] CRAN (R 4.2.1)
## 
##  [1] /home/jono/R/x86_64-pc-linux-gnu-library/4.1
##  [2] /usr/local/lib/R/site-library
##  [3] /usr/lib/R/site-library
##  [4] /usr/lib/R/library
## 
## ──────────────────────────────────────────────────────────────────────────────


To leave a comment for the author, please follow the link and comment on their blog: rstats on Irregularly Scheduled Programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)