[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Writing an R script is one thing. Organizing your process: where to put the data, how to refer to files in scripts, how to run the scripts, and how to produce and collect and report the results; that's quite another. Every R user has their own workflow for doing data analysis with R, but the best workflows achieve the following goals:
- Transparency: A good workflow organizes the elements of the project logically and clearly, to make it easy for an observer (including yourself) to understand how the pieces come together.
- Maintainability: A good workflow makes it easy to modify and adapt the project. Standardized script names and good commenting practices (in the code, as well as things like README files) are key here.
- Modularity: A good workflow encapsulates discrete tasks into separate components (e.g. scripts), so that it's always clear where modifications need to be made (and only made in one place), and components are re-usable for other projects.
- Portability: A good workflow makes it easy to move the project to another system, or hand it over to another person to work on, in such a way that it can still easily be run elsewhere. (By using relative (not absolute) pathnames, and remote access to sharedWorkflow for statistical analysis and report writing data, are two examples.)
- Reproducibility: A good workflow makes it easy for you, or others, to reproduce your results.
- Efficiency: Here I'm referring to the efficiency of you, the programmer, not computational efficiency. A good workflow saves you time, by making it easier to work on the project, and by automating as much of the process as possible.
Other than the package system (which is great, but can be overkill for many projects), R doesn't have any formal standards for designing a workflow. But here are a couple of suggestions from the R community:
- For projects to create a complete report from R code, see answers to the question Workflow for statistical analysis and report writing on StackOverflow.
- For more general development projects in R, John Myles White is developing the ProjectTemplate package to help standardize the structure of a project.
If you have other suggestions for organizing an R workflow, let us know in the comments.
To leave a comment for the author, please follow the link and comment on their blog: Revolutions.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.