Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
[Here you can see the Building views with R cheat sheet at a full resolution]
Queries
In database theory a query is a request for data or information from a database table or combination of tables.
Since dplyr
we have something that quite closely conceptually resembles a query in R
:
require(dplyr)
## Warning: package 'dplyr' was built under R version 3.2.5
require(pryr)
mtcars %>% tbl_df() %>% group_by(cyl) %>% summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))
## # A tibble: 3 × 3 ## cyl mean_mpg sd_mpg ## <dbl> <dbl> <dbl> ## 1 4 26.66364 4.509828 ## 2 6 19.74286 1.453567 ## 3 8 15.10000 2.560048
I particularly appreciate of dplyr
the possibility of building my query as a step by step set of R
statement that I can progressively test at each step.
Views
Again in database theory, a view is the result set of a stored query on the data, which the database users can query just as they would in a table.
I would like to have something similar to a view in R
As far as I know, I can achieve this goal in three ways:
- Function
makeActiveBinding
- Operator
%>a%
from packagepryr
- My proposed `%>>% operator
Function makeActiveBinding()
Function makeActiveBinding(sym, fun, env)
installs a function in an environment env
so that getting the value of sym
calls fun
with no arguments.
As a basic example I can actively bind a function that simulates a dice to an object named dice
:
makeActiveBinding("dice", function() sample(1:6, 1), env = globalenv())
so that:
replicate(5 , dice)
## [1] 5 1 6 2 3
Similarly, I can wrap adplyr
expression into a function:
f <- function() {mtcars %>% group_by(cyl) %>% summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))}
and then actively bind it to a symbol:
makeActiveBinding('view', f , env = globalenv())
so that, any time we call view
the result of function f()
is computed again:
view
## # A tibble: 3 × 3 ## cyl mean_mpg sd_mpg ## <dbl> <dbl> <dbl> ## 1 4 26.66364 4.509828 ## 2 6 19.74286 1.453567 ## 3 8 15.10000 2.560048
As a result, if I change any value of mpg
within mtcars
, view
is automatically updated:
mtcars$mpg[c(1,3,5)] <- 0 view
## # A tibble: 3 × 3 ## cyl mean_mpg sd_mpg ## <dbl> <dbl> <dbl> ## 1 4 24.59091 9.231192 ## 2 6 16.74286 7.504189 ## 3 8 13.76429 4.601606
Clearly, I have to admit that all of this looks quite unfriendly, at least to me.
Operator %<a-%
A valid alternative, that wraps away the complexity of function makeActiveBinding()
is provided by operator %<a-%
from package pryr
:
view %<a-% {mtcars %>% group_by(cyl) %>% summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))}
Again, if I change any value of mpg
within mtcars
, the value of view
get automatically updated:
mtcars$mpg[c(1,3,5)] <- 50 view
## # A tibble: 3 × 3 ## cyl mean_mpg sd_mpg ## <dbl> <dbl> <dbl> ## 1 4 29.13636 8.159568 ## 2 6 23.88571 11.593451 ## 3 8 17.33571 9.688503
Note that in this case I have to enclose the whole expression within curly brackets.
Moreover, the final assignment: %<a-%
goes on the left hand side of my chain of dplyr
statements.
Operator %>>%
Finally I would like to propose a third alternative, still based on makeActiveBinding()
, that I named %>>%
`%>>%` <- function( expr, x) { x <- substitute(x) call <- match.call()[-1] fun <- function() {NULL} body(fun) <- call$expr makeActiveBinding(sym = deparse(x), fun = fun, env = parent.frame()) invisible(NULL) }
that can be used as:
mtcars %>% group_by(cyl) %>% summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg)) %>>% view
And again, if I change the values of mpg
:
mtcars$mpg[c(1,3,5)] <- 100
The content of view
changes accordingly
view
## # A tibble: 3 × 3 ## cyl mean_mpg sd_mpg ## <dbl> <dbl> <dbl> ## 1 4 33.68182 22.41624 ## 2 6 31.02857 30.44321 ## 3 8 20.90714 22.88454
I believe this operator offers two advantages:
- Avoids the usage of curly brackets around my
dplyr
expression - Allows me to actively assign the result of my chain of
dplyr
statements, in a more natural way at the end of the chain
The post Building views with R appeared first on Quantide – R training & consulting.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.