Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In “Abstract Data Types and the Uniform Referent Principle I: why Douglas T. Ross would hate nest(), unnest(), gather() and spread()”, I explained why the notation for interfacing to a data structure should be independent of that structure’s representation.
R programmers honour this principle in the same way that bricks hang in the sky.
All published R code that operates on data frames uses column
names. Sometimes these follow the $
operator;
sometimes the data frame is implicit via attach()
or similar. In the Tidyverse, the column names will often
be part of a mutate()
, the data frame being
piped through a sequence of %>%
operators.
And this is dreadful software engineering.
Why? Look at the tables below. They represent four different ways of storing my income data.
|
| ||||||||||||||||||||||||||||||||||||||||
|
|
Abstractly, the data is the same in each case, and if you’re
familiar with nest()
,
unnest()
, gather()
and spread()
,
you will easily see how to transform one table into
any of the others. But the tables are implemented in very different ways. If you access their elements with $
or an equivalent, and you then change the implementation, you have to rewrite all those accesses. Which is dreadful software engineering.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.