Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Welcome to post number twenty in the randomly redundant R rant series of posts, or R4 for short. It has been a little quiet since the previous post last June as we’ve been busy with other things but a few posts (or ideas at least) are queued.
Dependencies. We wrote about this a good year ago in post #17 which was (in part) tickled by the experience of installing one package … and getting a boatload of others pulled in. The topic and question of dependencies has seen a few posts over the year, and I won’t be able to do them all justice. Josh and I have been added a few links to the tinyverse.org page. The (currently) last one by Russ Cox titled Our Software Dependency Problem is particularly trenchant.
And just this week the topic came up in two different, and unrelated posts. First, in What I don’t like in you repo, Oleg Kovalov lists a brief but decent number of items by which a repository can be evaluated. And one is about [b]loated dependencies where he nails it with a quick When I see dozens of deps in the lock file, the first question which comes to my mind is: so, am I ready to fix any failures inside any of them? This is pretty close to what we have been saying around the tidyverse.
Second, in Beware the data science pin factory, Eric Colson brings an equation. Quoting from footnote 2: […] the number of relationships (r) grows as a function number of members (n) per this equation: r = (n^2-n) / 2. Granted, this was about human coordination and ideal team size. But let’s just run with it: For n=10, we get r=9 which is not so bad. For n=20, it is r=38. And for n=30 we are at r=87. You get the idea. “Big-Oh-N-squared”.
More dependencies means more edges between more nodes. Which eventually means more breakage.
Which gets us to announcement embedded in this post. A few months ago, in what still seems like a genuinely extra-clever weekend hack in an initial 100 or so lines, Edwin de Jonge put together a remarkable repo on GitLab. It combines Docker / Rocker via hourly cron
jobs with deployment at netlify … giving us badges which visualize the direct as well as recursive dependencies of a package. All in about 100 lines, fully automated, autonomously running and deployed via CDN. Amazing work, for which we really need to praise him! So a big thanks to Edwin.
With these CRAN Dependency Badges being available, I have been adding them to my repos at GitHub over the last few months. As two quick examples you can see
- Rcpp
- RcppArmadillo
to get the idea. RcppArmadillo (or RcppEigen or many other packages) will always have one: Rcpp. But many widely-used packages such as data.table also get by with a count of zero. It is worth showing this – and the badge does just that! And I even sent a PR to the badger package: if you’re into this, you can have a badge made for your via badger::badge_depdencies(pkgname)
.
Otherwise, more details at Edwin’s repo and of course his actual tinyverse.netlify.com site hosting the badges. It’s easy as all other badges: reference the CRAN package, get a badge.
So if you buy into the idea that lightweight is the right weight then join us and show it via the dependency badges!
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.