Site icon R-bloggers

mlr + drake: Reproducible machine-learning workflow management

[This article was first published on r-bloggers on Machine Learning in R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

You may have heard about the drake package. It got a lot attention recently in the R community because it simplifies reproducible workflow management. This comes especially handy for large projects which have hundreds of intermediate steps. Built-in High-Performance-Cluster (HPC) support and graph visualization are just two goodies that come on top of the basic functionality.

drake is able to track changes in your intermediate targets. This means once you change something in your early workflow pipeline, drake will automatically update all follow-up objects that might be affected by this change. The following tweet wraps the struggle of keeping track of dependencies in a research project in an simple picture:

Save me from myself and having to remember all this when files change pic.twitter.com/hVeSFQOimj

— Dr. Brianna McHorse ????️‍???? (@fossilosophy) 21. Februar 2018

The maintainer of drake, Will Landau (@wlandau) is extremely responsive and has also written one of the most extensive and detailed manuals that exist in the R package jungle.

If you have installed drake, you can start right away with one of the built-in examples.

drake::drake_example("mlr-slurm")

At the time of writing, there are 17(!) examples that you can choose from. One of the newest shows how to use mlr with drake on a HPC.

Machine-Learning projects/tasks interact especially well with the drake idea since you can easily create large comparison matrices using different algorithms / hyperparameter settings. At the same time drake can sent these settings in parallel to a HPC for you, simplifying your modeling tasks a lot.

To leave a comment for the author, please follow the link and comment on their blog: r-bloggers on Machine Learning in R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.