Operator Notation for Data Transforms
[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
As of cdata
version 1.0.8
cdata
implements an operator notation for data transform.
The idea is simple, yet powerful.
First let’s start with some data.
d <- wrapr::build_frame( "id", "measure", "value" | 1 , "AUC" , 0.7 | 1 , "R2" , 0.4 | 2 , "AUC" , 0.8 | 2 , "R2" , 0.5 ) knitr::kable(d)
id | measure | value |
---|---|---|
1 | AUC | 0.7 |
1 | R2 | 0.4 |
2 | AUC | 0.8 |
2 | R2 | 0.5 |
In the above data we have two measurements each for two individuals (individuals identified by the "id
" column). Using cdata
‘s new_record_spec()
method we can capture a description of this record structure.
library("cdata") record_spec <- new_record_spec( wrapr::build_frame( "measure", "value" | "AUC" , "AUC" | "R2" , "R2" ), recordKeys = "id") print(record_spec)
## $controlTable ## measure value ## 1 AUC AUC ## 2 R2 R2 ## ## $recordKeys ## [1] "id" ## ## $controlTableKeys ## [1] "measure" ## ## attr(,"class") ## [1] "cdata_record_spec"
Once we have this specification we can transform the data using operator notation.
We can collect the record blocks into rows by a "division" (or aggregation/projection) step.
knitr::kable(d)
id | measure | value |
---|---|---|
1 | AUC | 0.7 |
1 | R2 | 0.4 |
2 | AUC | 0.8 |
2 | R2 | 0.5 |
d2 <- d %//% record_spec knitr::kable(d2)
id | AUC | R2 |
---|---|---|
1 | 0.7 | 0.4 |
2 | 0.8 | 0.5 |
We can expand record rows into blocks by a "multiplication" (or join) step.
knitr::kable(d2)
id | AUC | R2 |
---|---|---|
1 | 0.7 | 0.4 |
2 | 0.8 | 0.5 |
d3 <- d2 %**% record_spec knitr::kable(d3)
id | measure | value |
---|---|---|
1 | AUC | 0.7 |
1 | R2 | 0.4 |
2 | AUC | 0.8 |
2 | R2 | 0.5 |
And that is truly fluid data manipulation.
This article can be found in a vignette here.
To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.