R Hero saves Backup City with archivist and GitHub
Marcin Kosiński
[This article was first published on http://r-addict.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Have you ever suffered because of the impossibility of reproducing graphs, tables or analysis’ results in R? Have you ever bothered yourself for not being able to share R objects (i.e., plots or final analysis models) within your reports, posters or articles? Or maybe simply you have too many objects you can’t manage to store in a convenient and handy way? Now you can share partial results of analysis, provide hooks to valuable R objects within articles, manage analysis’ results and restore objects’ pedigree with archivist package and its extension archivist.github, allautomatically through GitHub without closing RStudio. If you are tired of archiving results by yourself, then read how you can became an R Hero with the archivist.github package power.
R Hero archiving power
Recently I’ve visited Backup City, a data analysis mecca in the middle of Reproducible Research RLand. That’s where I ovearheared a feverish discussion between R Hero and commissar O’Rdon. You can read the story of their meeting at the opening comic.
archivist.gitub: archivist and GitHub integration
archivist.github is a package with tools for archiving, managing and sharing R objects via GitHub and is the extension of the archivist. You can install package from CRAN
I have prepared a workflow graph to visualize functionalities of archivist.github
and provide explanation of core powers in this post.
After you’ve created a GitHub developer application (the process is described at archivist.github: 2.1 OAuth open autorization, set: Homepage URL – http://github.com, Authorization callback URL – http://localhost:1410) you will be able to automatically create repositories on GitHub from R console.
Below is an example on how to authorise with GitHub API (using your application Client ID and Client Secret), create a GitHub repository with archivist-like Repository and automatically archive R object on GitHub
archivist::aread('archivistR/RHero/ff575c261c949d073b2895b05d1097c3')
One can check that the artifact is really on GitHub and that the commit was performed (with great help of git2r package)
Each object (referred as artifact) is archived with it’s metadata and md5hash in case someone would like to restore or search for archived objects within Repository.
Partial results archiving and objects’ pedigree restoration
We have prepared extended version of pipe – %>% operator %a% so that every partial result of analysis workflow can be archived. Below is an example of workflow archiving for RTCGA (about which I wrote here) RNASeq data (genes’ expression) (broader example can be find here) and it’s pedigree restoration
Column with [[env]] is the object before transformations. We are working on using original names for objects in this issue.
This operation does not archive objects automatically on GitHub as this is functionality from base archivist package. One have to upload objects with
Overload print() to use archive()
After global parameters specification (aoptions() function sets ‘user’, ‘repo’, and ‘password’ parameters for each archivist.github and archivist function globally) we don’t have to use archive function after each call to provide hooks in rmarkdown reports. We can overload print() function for specific classes so that after printing objects will be also evaluated with archive function.
Load: archivist::aread('archivistR/RHero/2b639023bc41e289aa21d790d5876736')
Call:
lm(formula = weight ~ group, data = pld)
Coefficients:
(Intercept) groupTrt
5.032 -0.371
Load: archivist::aread('archivistR/RHero/a33c804ff1d0b652210a39e2071d1e14')
Call:
lm(formula = weight ~ group – 1, data = pld)
Coefficients:
groupCtl groupTrt
5.032 4.661
This is the GitHub equivalent for local archiving with addHooksToPrint