Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We are excited to announce the rquery
R
package.
rquery
is Win-Vector LLC‘s currently in development big data query tool for R
.
rquery
supplies set of operators inspired by Edgar F. Codd‘s relational algebra (updated to reflect lessons learned from working with R
, SQL
, and dplyr
at big data scale in production).
As an example: rquery
operators allow us to write our earlier “treatment and control” example as follows.
dQ <- d %.>% extend_se(., if_else_block( testexpr = "rand()>=0.5", thenexprs = qae( a_1 := 'treatment', a_2 := 'control'), elseexprs = qae( a_1 := 'control', a_2 := 'treatment'))) %.>% select_columns(., c("rowNum", "a_1", "a_2"))
rquery
pipelines are first-class objects; so we can extend them, save them, and even print them.
cat(format(dQ)) table('d') %.>% extend(., ifebtest_1 := rand() >= 0.5) %.>% extend(., a_1 := ifelse(ifebtest_1,"treatment",a_1), a_2 := ifelse(ifebtest_1,"control",a_2)) %.>% extend(., a_1 := ifelse(!( ifebtest_1 ),"control",a_1), a_2 := ifelse(!( ifebtest_1 ),"treatment",a_2)) %.>% select_columns(., rowNum, a_1, a_2)
rquery
targets only databases, and right now primarilly SparkSQL
and PostgreSQL
. rquery
is primarily a SQL
generator, allowing it to avoid some of the trade-offs required to directly support in-memory data.frame
s. We demonstrate converting the above rquery
pipeline into SQL
and executing it here.
rquery
itself is still in early development (and not yet ready for extensive use in production), but it is maturing fast, and we expect more rquery
announcements going forward. Our current intent is to bring in sponsors, partners, and R
community voices to help develop and steer rquery
.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.