Controlling Data Layout With cdata

Posted on April 16, 2019 by John Mount in R bloggers | 0 Comments

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Here is an example how easy it is to use cdata to re-layout your data.

Tim Morris recently tweeted the following problem (corrected).

Please will you take pity on me #rstats folks?
I only want to reshape two variables x & y from wide to long!

Starting with:
    d xa xb ya yb
    1  1  3  6  8
    2  2  4  7  9

How can I get to:
    id t x y
    1  a 1 6
    1  b 3 8
    2  a 2 7
    2  b 4 9
    
In Stata it's:
 . reshape long x y, i(id) j(t) string
In R, it's:
 . an hour of cursing followed by a desperate tweet 👆

Thanks for any help!

PS – I can make reshape() or gather() work when I have just x or just y.

This is not to make fun of Tim Morris: the above should be easy. Using diagrams and slowing down the data transform into small steps makes the process very easy.

First: (and this is the important part) define our problem using an example. Tim Morris did this really well, but let’s repeat it here. We want to realize the following data layout transform.

Second: identify the record ID and record structure in both the before and after examples.

Third: attach the cdata package, and use build_frame() to type in the example “before” data.

library("cdata")

before <- build_frame(
  "id"  , "xa", "xb", "ya", "yb" |
    1   , 1   , 3   , 6   , 8    |
    2   , 2   , 4   , 7   , 9    )

knitr::kable(before)

id	xa	xb	ya	yb
1	1	3	6	8
2	2	4	7	9

Fourth: (this is the "hard" part) copy the column marked names from the before into the matching record positions in the after example.

Fifth: copy the annotated "after" record in as your layout transform control table.

ct <- qchar_frame(
  "t"  , "x" , "y" |
    "a", xa  , ya  |
    "b", xb  , yb  )

knitr::kable(ct)

t	x	y
a	xa	ya
b	xb	yb

In the above we are using a convention that concrete values are written in quotes, and symbols to be taken from the "before" data frame are written without quotes.

Now specify the many-record transform.

layout_spec <- rowrecs_to_blocks_spec(
  ct,
  recordKeys = "id")

The layout_spec completely encodes our intent. So we can look at it to double check what transform we have specified.

print(layout_spec)

## {
##  row_record <- wrapr::qchar_frame(
##    "id"  , "xa", "xb", "ya", "yb" |
##      .   , xa  , xb  , ya  , yb   )
##  row_keys <- c('id')
## 
##  # becomes
## 
##  block_record <- wrapr::qchar_frame(
##    "id"  , "t", "x", "y" |
##      .   , "a", xa , ya  |
##      .   , "b", xb , yb  )
##  block_keys <- c('id', 't')
## 
##  # args: c(checkNames = TRUE, checkKeys = TRUE, strict = FALSE)
## }

And we can now apply the layout transform to data.

after <- before %.>% layout_spec
# cdata 1.0.9 adds the non-piped function notation:
# layout_by(layout_spec, before)

knitr::kable(after)

id	t	x	y
1	a	1	6
1	b	3	8
2	a	2	7
2	b	4	9

A really fun extra: we can build an inverse layout specification to reverse the transform.

reverse_layout <- t(layout_spec) # invert the spec using t()

print(reverse_layout)

## {
##  block_record <- wrapr::qchar_frame(
##    "id"  , "t", "x", "y" |
##      .   , "a", xa , ya  |
##      .   , "b", xb , yb  )
##  block_keys <- c('id', 't')
## 
##  # becomes
## 
##  row_record <- wrapr::qchar_frame(
##    "id"  , "xa", "xb", "ya", "yb" |
##      .   , xa  , xb  , ya  , yb   )
##  row_keys <- c('id')
## 
##  # args: c(checkNames = TRUE, checkKeys = TRUE, strict = FALSE)
## }

after %.>% 
  reverse_layout %.>%
  knitr::kable(.)

id	xa	xb	ya	yb
1	1	3	6	8
2	2	4	7	9

And that is it, we have a re-usable layout_spec that can transform future data. We have many tutorials on the method here, and the source code for this note can be found here.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Controlling Data Layout With cdata

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)