Site icon R-bloggers

Building views with R

[This article was first published on R blog | Quantide - R training & consulting, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

 

[Here you can see the Building views with R cheat sheet at a full resolution]

Queries

In database theory a query is a request for data or information from a database table or combination of tables.

Since dplyr we have something that quite closely conceptually resembles a query in R:

require(dplyr)

## Warning: package 'dplyr' was built under R version 3.2.5

require(pryr)

mtcars %>% 
  tbl_df() %>% 
  group_by(cyl) %>% 
  summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))

## # A tibble: 3 × 3
##     cyl mean_mpg   sd_mpg
##   <dbl>    <dbl>    <dbl>
## 1     4 26.66364 4.509828
## 2     6 19.74286 1.453567
## 3     8 15.10000 2.560048

I particularly appreciate of dplyr the possibility of building my query as a step by step set of R statement that I can progressively test at each step.

 

Views

Again in database theory, a view is the result set of a stored query on the data, which the database users can query just as they would in a table.

I would like to have something similar to a view in R

As far as I know, I can achieve this goal in three ways:

  • Function makeActiveBinding
  • Operator %>a% from package pryr
  • My proposed `%>>% operator

 

Function makeActiveBinding()

Function makeActiveBinding(sym, fun, env) installs a function in an environment env so that getting the value of sym calls fun with no arguments.

As a basic example I can actively bind a function that simulates a dice to an object named dice :

makeActiveBinding("dice", function() sample(1:6, 1), env = globalenv())

so that:

replicate(5 , dice)

## [1] 5 1 6 2 3

Similarly, I can wrap adplyr expression into a function:

f <- function() {mtcars %>% 
  group_by(cyl) %>% 
  summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))}

and then actively bind it to a symbol:

makeActiveBinding('view', f , env = globalenv())

so that, any time we call view the result of function f()is computed again:

view

## # A tibble: 3 × 3
##     cyl mean_mpg   sd_mpg
##   <dbl>    <dbl>    <dbl>
## 1     4 26.66364 4.509828
## 2     6 19.74286 1.453567
## 3     8 15.10000 2.560048

As a result, if I change any value of mpg within mtcars, view is automatically updated:

mtcars$mpg[c(1,3,5)] <- 0
view

## # A tibble: 3 × 3
##     cyl mean_mpg   sd_mpg
##   <dbl>    <dbl>    <dbl>
## 1     4 24.59091 9.231192
## 2     6 16.74286 7.504189
## 3     8 13.76429 4.601606

Clearly, I have to admit that all of this looks quite unfriendly, at least to me.

 

Operator %<a-%

A valid alternative, that wraps away the complexity of function makeActiveBinding() is provided by operator %<a-% from package pryr:

view %<a-%  {mtcars %>% 
  group_by(cyl) %>% 
  summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))}

Again, if I change any value of mpg within mtcars, the value of view get automatically updated:

mtcars$mpg[c(1,3,5)] <- 50
view

## # A tibble: 3 × 3
##     cyl mean_mpg    sd_mpg
##   <dbl>    <dbl>     <dbl>
## 1     4 29.13636  8.159568
## 2     6 23.88571 11.593451
## 3     8 17.33571  9.688503

Note that in this case I have to enclose the whole expression within curly brackets.

Moreover, the final assignment: %<a-% goes on the left hand side of my chain of dplyr statements.

 

Operator %>>%

Finally I would like to propose a third alternative, still based on makeActiveBinding(), that I named %>>%

`%>>%` <- function( expr, x) {
  x <- substitute(x)
  call <-   match.call()[-1]
  fun <- function() {NULL}
  body(fun) <- call$expr
  makeActiveBinding(sym = deparse(x), fun = fun, env = parent.frame())
  invisible(NULL)
}

that can be used as:

mtcars %>% 
  group_by(cyl) %>% 
  summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg)) %>>% 
  view

And again, if I change the values of mpg:

mtcars$mpg[c(1,3,5)] <- 100

The content of view changes accordingly

view

## # A tibble: 3 × 3
##     cyl mean_mpg   sd_mpg
##   <dbl>    <dbl>    <dbl>
## 1     4 33.68182 22.41624
## 2     6 31.02857 30.44321
## 3     8 20.90714 22.88454

I believe this operator offers two advantages:

  • Avoids the usage of curly brackets around my dplyr expression
  • Allows me to actively assign the result of my chain of dplyr statements, in a more natural way at the end of the chain

The post Building views with R appeared first on Quantide – R training & consulting.

To leave a comment for the author, please follow the link and comment on their blog: R blog | Quantide - R training & consulting.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.