The DRY Principle and Knowing When to Make a Package
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Don’t Repeat Yourself (DRY)
Probably everyone who has done some kind of programming has heard of the “Don’t Repeat Yourself” (DRY) principle. In a nutshell, it’s about reducing code redundancy for the purpose of reducing error and enhancing readability.
Undoubtedly the most common manifestation of the DRY principle is the creation of a function for re-used logic. The “rule of 3” is a good shorthand for identifying when you might want to rethink how your code is organized– “You should consider writing a function whenever you’ve copied and pasted a block of code more than twice (i.e. you now have three copies of the same code)”, per the R For Data Science book
The DRY principle can be applicable to other settings as well. For example, data scientist David Robinson once remarked (probably only half jokingly) that one should write a blog post after giving the same piece of advice three time. 1
DRY and R Packages
I’d like to suggest applying the DRY principle to package creation–if you use a set of 3 functions at least 3 times, then you should put them in a package. In fact, there need not be more than 1 function or 1 use case–given that the function or use case is significant enough–to justify the initiation of a package.
Well-known R community member Bob Rud–who is among the most active R developers– has made creating minimal, useful R packages commonplace. In his description of his Stack Overflow Driven Development, he suggests that not only is such a practice useful for abstracting functionality, but it can also be a great way to enhance one’s skills while helping others.
If one is fearful of creating a package–perhaps due to the potential responsibility of having to maintain it if it is used by others–then I would suggest creating a “personal” package (that you don’t really intend to be used by anyone else). In fact, I believe it is fairly common practice for active R users to have their own personal packages. 2 For example,
- Bob Rud’s
{hrbrmisc}
andhrbrthemes
- David Robinson’s
{drlib}
- Julia Silge’s
{silgelib}
- Strenge Jack’s
{sjmisc}
,{sjPlot}
,{sjstats}
, and{sjlabelled}
Following the examples of others, I have created several personal packages for atomic purposes. 3
{tetext}
for Tidy Text Mining principles.{teplot}
for plotting functions.{teproj}
for project-related functions.
Examples
To give an example of how such a package can be useful, I’ll describe
some recent additions to my {teplot}
package to assist with
geo-spatial visualization of single U.S. states.
I was working on something related to high schools in Texas (look out for a blog post in the future) and was beginning to copy-paste some functions that I had used for another project for visualizing geo-spatial data in the state. The moment I thought about reusing the code was the moment that I realized that I should put it in a package. Now, visualizing geo-spatial data in Texas is as easy as follows.
library("ggplot2") library("teplot") library("ggmap") library("dplyr") path <- "https://raw.githubusercontent.com/tonyelhabr/uil-v02/master/data/schools-nces-join.csv" schools_geo <- readr::read_csv(path) viz_map_base <- teplot::create_map_state(state = "texas", show_county = FALSE) + teplot::theme_te() + teplot::theme_map() viz_map_bycnts <- viz_map_base + geom_polygon( data = schools_geo %>% count(county) %>% inner_join( teplot::get_map_data_county_tx(), by = c("county" = "subregion") ), aes(fill = n) ) + scale_fill_gradient( trans = "log", low = "white", high = "red" ) + theme(legend.position = "none") viz_map_bycnts
I also added a stamen map to the package data so that I can use it easily as a base layer.
viz_map_bycnts_stamen <- teplot::ggmap_stamen_tx + geom_point( data = schools_geo, aes(x = lon, y = lat), color = "red", size = 1 ) viz_map_bycnts_stamen
To give a separate example, I used my {tetext}
package for nearly all
of the code in the flexdashboard that I created for analyzing the
Twitter acccounts of NBA
teams. There, I
simply called the function tetext::visualize_time_facet()
to generate
a fairly illustrative visual.
viz_time_facet_all <- data_facet_trim %>% tetext::visualize_time_facet( timebin = timestamp, color = id, facet = id, scale_manual_config = list(values = colors_filt), facet_config = list(scales = "fixed"), labs_config = list(title = NULL) ) viz_time_facet_all
Getting Started
If one has no experience with creating packages and does not know where to get started, there are plenty of awesome resources out there to learn more about it. To name a few:
- Hillary Parker’s “famous” blog post
- R Packages book by Hadley Wickham
- Karl Broman’s primer
- Jenny Bryan’s class tutorial
(There’s a good reason why these resources show up at the top of a Google search for “R packages”.)
Although it may seem daunting at first, one should realize that the pay-off will be great. (Just think about how much time, effort, and debugging you save when writing a function. Now scale that feeling by the number of functions that you include in your package!) I created my first package to assist with using my company’s color scheme in plots. Up until that point, I had been needlessly copy-pasting the same hex values into each separate project where I wanted to use the color palette. (If this happens to be your use case, then check out Dr. Simon J’s blog post on exactly this topic!)
Conclusion
Even if it you don’t program much (or at all), the DRY principle will
undoubtedly be applicable to you at some point in time. If you’re
working with R
, then I suggest using packages as a solution to your
DRY problems.
- Hadley Wickham suggested that a book might be even better. ^
- Although several of these have actually been published on CRAN (suggesting that they are really more than just personal packages), each started out as just a package for the individual. ^
- It seems common to include one’s initials in the name of a personal package, so I have copied that format. ^
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.