Site icon R-bloggers

Operator Notation for Data Transforms

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

As of cdata version 1.0.8 cdata implements an operator notation for data transform.

The idea is simple, yet powerful.

First let’s start with some data.

d <- wrapr::build_frame(
  "id", "measure", "value" |
    1   , "AUC"    , 0.7     |
    1   , "R2"     , 0.4     |
    2   , "AUC"    , 0.8     |
    2   , "R2"     , 0.5     )

knitr::kable(d)
id measure value
1 AUC 0.7
1 R2 0.4
2 AUC 0.8
2 R2 0.5

In the above data we have two measurements each for two individuals (individuals identified by the "id" column). Using cdata‘s new_record_spec() method we can capture a description of this record structure.

library("cdata")

record_spec <- new_record_spec(
  wrapr::build_frame(
    "measure", "value" |
    "AUC"    , "AUC" |
    "R2"     , "R2"  ),
  recordKeys = "id")

print(record_spec)
## $controlTable
##   measure value
## 1     AUC   AUC
## 2      R2    R2
## 
## $recordKeys
## [1] "id"
## 
## $controlTableKeys
## [1] "measure"
## 
## attr(,"class")
## [1] "cdata_record_spec"

Once we have this specification we can transform the data using operator notation.

We can collect the record blocks into rows by a "division" (or aggregation/projection) step.

knitr::kable(d)
id measure value
1 AUC 0.7
1 R2 0.4
2 AUC 0.8
2 R2 0.5
d2 <- d %//% record_spec

knitr::kable(d2)
id AUC R2
1 0.7 0.4
2 0.8 0.5

We can expand record rows into blocks by a "multiplication" (or join) step.

knitr::kable(d2)
id AUC R2
1 0.7 0.4
2 0.8 0.5
d3 <- d2 %**% record_spec

knitr::kable(d3)
id measure value
1 AUC 0.7
1 R2 0.4
2 AUC 0.8
2 R2 0.5

And that is truly fluid data manipulation.

This article can be found in a vignette here.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.