Site icon R-bloggers

cdata Control Table Keys

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In our cdata R package and training materials we emphasize the record-oriented thinking and how to design a transform control table. We now have an additional exciting new feature: control table keys.

The user can now control which columns of a cdata control table are the keys, including now using composite keys (that is keys that are spread across more than one column). This is easiest to demonstrate with an example.

Consider the simple data frame (the first couple of rows of the famous iris data set).


d <- wrapr::build_frame(
  "Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species" |
  5.1           , 3.5          , 1.4           , 0.2          , "setosa"  |
  4.9           , 3            , 1.4           , 0.2          , "setosa"  )
d$id <- seq_len(nrow(d))

Sepal.Length Sepal.Width Petal.Length Petal.Width Species id
5.1 3.5 1.4 0.2 setosa 1
4.9 3.0 1.4 0.2 setosa 2

Suppose we wish to land the dimensions of different irises in rows keyed by two columns: Part and Measure. That is we want the data to look like the following.

expect <- wrapr::build_frame(
  "id", "Species", "Part" , "Measure", "Value" |
  1L  , "setosa" , "Sepal", "Length" , 5.1     |
  1L  , "setosa" , "Sepal", "Width"  , 3.5     |
  1L  , "setosa" , "Petal", "Length" , 1.4     |
  1L  , "setosa" , "Petal", "Width"  , 0.2     |
  2L  , "setosa" , "Sepal", "Length" , 4.9     |
  2L  , "setosa" , "Sepal", "Width"  , 3       |
  2L  , "setosa" , "Petal", "Length" , 1.4     |
  2L  , "setosa" , "Petal", "Width"  , 0.2     )

id Species Part Measure Value
1 setosa Sepal Length 5.1
1 setosa Sepal Width 3.5
1 setosa Petal Length 1.4
1 setosa Petal Width 0.2
2 setosa Sepal Length 4.9
2 setosa Sepal Width 3.0
2 setosa Petal Length 1.4
2 setosa Petal Width 0.2

With multiple control table keys this is easy.

First define the control table specifying what a single row-record looks like after transformation to a multi-row block-record.

control_table <- wrapr::qchar_frame(
  Part,  Measure, Value          |
  Sepal, Length,  "Sepal.Length" |
  Sepal, Width,   "Sepal.Width"  |
  Petal, Length,  "Petal.Length" |
  Petal, Width,   "Petal.Width"  )

We have designed our desired transform above (as taught here). We are introducing a new convention of putting the non-key values in quotes (even though wrapr::qcar_frame() does not need this). Notice the quoted fields are column-names of the original row-oriented data, and the quote is indicating that these are names standing in for values.

We can now have cdata::rowrecs_to_blocks() perform the transform. Notice we are specifying that the columns Part and Measure are the control table row-keys.

res <- rowrecs_to_blocks(
  controlTableKeys = c("Part", "Measure"),
  columnsToCopy = c("id", "Species"))

id Species Part Measure Value
1 setosa Sepal Length 5.1
1 setosa Sepal Width 3.5
1 setosa Petal Length 1.4
1 setosa Petal Width 0.2
2 setosa Sepal Length 4.9
2 setosa Sepal Width 3.0
2 setosa Petal Length 1.4
2 setosa Petal Width 0.2

And, as one comes to expect with coordinatized/fluid-data transforms, the process is easy to reverse (modulo column and row order).

back <- blocks_to_rowrecs(
  keyColumns = c("id", "Species"),
  controlTableKeys = c("Part", "Measure"))

id Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1 setosa 5.1 3.5 1.4 0.2
2 setosa 4.9 3.0 1.4 0.2

The cdata unit test include the following variations of the above example:

We think cdata (and the accompanying fluid data methodology, plus extensions) is a very deep and powerful way of wrangling data. Once you take the time to learn the methodology (which is "draw what you want to happen to one record, type that in as your control table, and you are done!") it is very easy to use.

Note: the above capabilities is now part of the development version of cdata, and will likely make their way to the CRAN release in a couple of weeks. We have some substantial reasons for announcing it now. Users who would like to try the feature can either wait for the CRAN release, or install the development version of cdata with a procedure such as devtools::install_github("WinVector/cdata"). A thank you to Recle Etino Vibal for requesting this feature, and moving general keys from my “nice to have” list to “at least one user requested it” list.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.