Abstract Data Types and the Uniform Referent Principle II: why Douglas T. Ross would hate nest(), unnest(), gather() and spread()

Jocelyn Ireson-Paine

5 years ago

[This article was first published on R – Jocelyn Ireson-Paine's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In “Abstract Data Types and the Uniform Referent Principle I: why Douglas T. Ross would hate nest(), unnest(), gather() and spread()”, I explained why the notation for interfacing to a data structure should be independent of that structure’s representation. R programmers honour this principle in the same way that bricks hang in the sky. All published R code that operates on data frames uses column names. Sometimes these follow the $ operator; sometimes the data frame is implicit via attach() or similar. In the Tidyverse, the column names will often be part of a mutate(), the data frame being piped through a sequence of %>% operators. And this is dreadful software engineering.

Why? Look at the tables below. They represent four different ways of storing my income data.

Person	Income_Type	Income_Value
Alice	Wages	37000
Alice	Bonuses	0
Alice	Benefits	0
Bob	Wages	14000
Bob	Bonuses	1000
Bob	Benefits	6000

Person	Income_Wages	Income_Bonuses	Income_Benefits
Alice	37000	0	0
Bob	14000	1000	6000

Person

Income

Alice

Type	Value
Wages	37000
Bonuses	0
Benefits	0

Bob

Type	Value
Wages	14000
Bonuses	1000
Benefits	6000

Person

Income

Alice

Wages	Bonuses	Benefits
37000	0	0

Bob

Wages	Bonuses	Benefits
14000	1000	6000

Abstractly, the data is the same in each case, and if you’re familiar with nest(), unnest(), gather() and spread(), you will easily see how to transform one table into any of the others. But the tables are implemented in very different ways. If you access their elements with $ or an equivalent, and you then change the implementation, you have to rewrite all those accesses. Which is dreadful software engineering.

Cartoon of experimenter peering into innards of complicated piece of machinery. His colleague is holding a plug coming out of it labelled INTERFACE: 'GET' 'INCOME' 'WAGES' and saying 'Don't worry about how it works. It's the interface that's important.'

To leave a comment for the author, please follow the link and comment on their blog: R – Jocelyn Ireson-Paine's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.