Site icon R-bloggers

Rmd first: When development starts with documentation

[This article was first published on (en) The R Task Force, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Documentation matters ! Think about future you and others. Whatever is the aim of your script and analyses, you should think about documentation. The way I see it, R package structure is made for that. Let me try to convince you.

At use’R 2019 in Toulouse, I did a presentation entitled: ‘The “Rmd first” method: when projects start with documentation’. I ended up saying: Think Package ! If you are really afraid about building a package, you may want to have a look at these slides before. If you are not so afraid, you can start directly with this blog post. In any case, Paco the package should make this more enjoyable ! I hope so…

This article is quite long. You can do a quick read if you skip the bullet points. Use the complete version while you are developing a real project.

Why documentation matters ?

Prevent code sickness

If not, you are lucky. For now…

In any case, think about future you and think about future users or developpers of your work. Never think this is a one shot. One day or another, this one shot script will again be useful. It will be more useful if it is correctly documented.
Multiple reasons would make documentation necessary:

Don’t provoke (to much) sickness to people who will have to understand and use your code… Give them the remedy, give them a documentation.

If you read this post, you are good enough to create a package

You are not here by chance. You are here because one day, you wrote a line of code in the R language. Thus, trust me, you know enough to create a package.

When we tell our students/trainees, after their two first days, that the next step is to build a package, the first reactions are:

I just started, I am not developer who can create a package.
I do not want to share my work on the CRAN with the World, not even on Github.

Our answers are:

First “copy-paste” of code requires a function, first function requires a package.
You can create a package in six minutes and I’ll show it to you now: Create a package in a few minutes (text in French but video does not require audio).
You do not know what you are capable of. Trust me, in the next two days of tutorial, you will be able to create your own package, at least for internal use.

A package forces standardized documentation

There are different places in a package where to write some code, some text, some information, … Remembering the complete procedure of package development may be painful for beginners, even for advanced developers. Hence, some think they need to build a hundred packages before being able to get all of it. That’s not true.

The ‘Rmd first’ method may reduce the fear of package building by developing as if you were not in a package

Moreover, some useful packages are here to help you in all package development steps, among them {devtools}, {usethis}, {attachment}, {golem}, {desc}, …

Develop as if you were not in a package

In my presentation at use’R 2019 in Toulouse, I started developing my code in a “classical” R project. If you looked at this presentation, you know we will end up with a package. Here, we will directly start by building a package skeleton. Just a few files to think about before developing. You will see that everything is about documentation.
In this blog post, we will start to create a small data analysis in a project named my.analysis.package.

These steps looks like package development…

Your project needs to be presented. What is the objective of this work? Who are you? Are others allowed to use your code? We also keep a trace of the history of the package construction steps.

Create new “R package using devtools” if you are using Rstudio

Create a file named dev_history.R, for instance at the root of the project

Fill in the project DESCRIPTION file

Define the license

Proprietary 
Do not distribute outside of ThinkR.

Add the following lines in your dev_history.R script

# Document functions and dependencies
attachment::att_to_description()
# Check the package
devtools:check()

Work with your data in a Rmd document almost like classical project development

The Rmd file is your sandbox before being the userguide of your work. Use it to try, explore, test your data and analyses. Once you are ready, follow the guide to clean up the content and store everything at the right place.

Create a small dataset

If you are building this package for a specific analysis for your work or for a client, you may think you want to you use your entire dataset. Directly manipulating a big dataset can be painful during code writing. You may face long computation time while debugging. Do not neglect the power of a small reproducible example ! It may not cover all future possible bugs, but this will accelerate development and help documenting the code.

“Rmd first”: Here starts the “Rmd first” method.

Not really really first, but almost first as we haven’t started code development yet. So, let us create a vignette. This is a classical RMarkdown file stored in a specific place and that will be shown in a specifically formatted HTML output.

It may now be necessary to install your package.

This will make your dataset available for the vignette.

The “Rmd first” method looks like classical analysis

The workflow of an analysis often starts with data reading, cleansing, filtering, exploring, … These steps may be presented in a Rmarkdown file, so that you can create a report of your work and make it reproducible to others.
The “Rmd first” assumes that any project can start like data analysis. Whatever project you are doing, you will need some data, some reproducible examples to build it. Below is a tiny example of a Rmd build.

Load necessary packages at the top of your vignette

Write a sentence about what you plan to do in the next chunk

Say what you’ll do, do what you said (Diane Beldame, ThinkR)

Do what you said

# Get path
datapath <- system.file("example-data/my-dataset.csv", package = "my.analysis.package")
# Read data with {readr} for instance
clients <- readr::read_csv(datapath)

Document the function while you are still in the Rmd

#' Filter by department
#' 
#' @param x dataframe with column named id_dpt
#' @param dpt department number
#' 
#' @return a filtered dataframe
#' @export
#' @examples
#' dataset_path <- system.file("example-data/my-dataset.csv", package = "my.analysis.package")
#' clients <- read_csv(dataset_path)
#' filter_by_dpt(clients, dpt = 11)
filter_by_dpt <- function(x, dpt) {
  filter(x, id_dpt == dpt)
}

Create a test while you are still in the Rmd

dataset_path <- system.file("example-data/my-dataset.csv", package = "my.analysis.package")
clients <- read_csv(dataset_path)
output <- filter_by_dpt(clients, dpt = 11)
testthat::expect_is(output, "data.frame")
testthat::expect_equal(nrow(output), 10)

“Rmd cleansing”: Make it a proper a package

Everything you need to make a proper package is already set. We will now clean the Rmd. This means we will move parts of the code of the Rmd to the appropriate directory of the package. In the vignette, we will only keep the explanations and the code to use the main functions.

Polish the package

Your package is set. It is now time to run functions to create documentation in the package format and check if your package correctly follow construction rules.

Publish your documentation

As you created a package, you can use {pkgdown} to present the entire documentation in the same HTML site. The ‘Rmd first’ forced you to create vignettes that can also be presented as a HTML (or PDF) book, delivered as shareable userguides or as your analysis report.

Everything is a package

A package is not only for archives stored on the CRAN that must be shared with the entire world. A package is a directory containing specific files and folders, organised so that anyone can find documentation, code and tests in a precise place. Hence, every project can be a package, so that tools built around packages will allow/force you to work in a reproducible way with re-usable project. Think package:

Ressources

Enjoy writing documentation!
Future you and people around you will thank you for that!

The post Rmd first: When development starts with documentation appeared first on (en) The R Task Force.

To leave a comment for the author, please follow the link and comment on their blog: (en) The R Task Force.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.