Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Background
R relies on package repositories for initial installation of a package via install.packages()
. A crucial second step is update.packages()
: For all currently installed packages, a list of available updates is constructed or offered for either one-by-one or bulk updates. This keeps the local packages in sync with upstream, and provides for a very convenient way to obtain new features, bug fixes and other improvements. So by installing from a repository, we automatically have the ability to track the repository for updates.
Enter drat
Fairly recently, the drat package was added to the R ecosystem. It makes both aspects of package distribution easy: providing a package (if you are an author) as well as installing it (if you are a user). Now, because drat is at the same time source code (as it is also a package providing the functionality), and a repository (using what drat provides ib features), the "namespace" becomes a little cluttered.
But because a key feature of drat is the "one variable" unique identification via the GitHub, I opted to create a drat repository in the name of a new organisation: ghrr. This is a simple acronym for GitHub Hosted R Repository.
Use cases
We can outline several use case for packages in ghrr:
- packages not published in a repo by their authors: I already use two like that:
- packages possibly unsuitable for mainline repos:
- Rblpapi is a great package by Whit Armstong and John Laing to which I have been contributing quite a bit of late. As it requires a free-to-use but not open source library and headers from Bloomberg, it will never make it to the mainline repository for R, but hosting it in ghrr is perfect as I can easily update several machines at work once I cut a new development release;
- winsorize is a small package I needed a few weeks ago; it is spun out of robustHD but does not yet contain new code so Andreas and I are content to keep it in this drat for now;
- packages in pre-relase mode:
- RcppArmadillo where I announced both a release candidate before Armadillo 5.000 came out, as well as the actual RcppArmadillo 0.500.0.0 which is not (yet) on the mainline repository as two affected packages need a small update first. Users, however, can get RcppArmadillo already from the sibling Rcpp drat repo.
- RcppToml is a new package I am currently working on implementing a toml parser based on cpptoml. It works, but it not quite ready for public announcements yet, and hence perfect for ghrr.
Going forward
ghrr is meant to be open. While anybody can open a drat repository, particularly on GitHub, it may be beneficial to somehow group packages. This is however not something that can be planned ex-ante: it may just happen if others who see similar benefits in this can in fact contribute. In that spirit, I strongly encourage pull requests.
Early on, I made my commit messages conform to a pattern of package version sha1 repourl
to make code provenance of every commit very clear. Ideally, subsequent commits would conform to such a scheme, or replace it with a better one.
Some Resources
A few links to learn more about drat and ghrr:
- ghrr repo view (to see both content and commits) and ghrr web page
- drat repo view, drat code page, and drat web page
Comments and questions via email or issue tickets are more than welcome. We hope that others find ghrr to be a useful tool for easy repository management and use via GitHub.
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.