Default Hyperparameter Configuration

[This article was first published on mlr-org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Scope

The predictive performance of modern machine learning algorithms is highly dependent on the choice of their hyperparameter configuration. Options for setting hyperparameters are tuning, manual selection by the user, and using the default configuration of the algorithm. The default configurations are chosen to work with a wide range of data sets but they usually do not achieve the best predictive performance. When tuning a learner in mlr3, we can run the default configuration as a baseline. Seeing how well it performs will tell us whether tuning pays off. If the optimized configurations perform worse, we could expand the search space or try a different optimization algorithm. Of course, it could also be that tuning on the given data set is simply not worth it.

Probst, Boulesteix, and Bischl (2019) studied the tunability of machine learning algorithms. They found that the tunability of algorithms varies widely. Algorithms like glmnet and XGBoost are highly tunable, while algorithms like random forests work well with their default configuration. The highly tunable algorithms should thus beat their baselines more easily with optimized hyperparameters. In this article, we will tune the hyperparameters of a random forest and compare the performance of the default configuration with the optimized configurations.

Example

We tune the hyperparameters of the ranger learner on the spam data set. The search space is taken from Bischl et al. (2021).

library(mlr3verse)

learner = lrn("classif.ranger",
  mtry.ratio      = to_tune(0, 1),
  replace         = to_tune(),
  sample.fraction = to_tune(1e-1, 1),
  num.trees       = to_tune(1, 2000)
)

When creating the tuning instance, we set evaluate_default = TRUE to test the default hyperparameter configuration. The default configuration is evaluated in the first batch of the tuning run. The other batches use the specified tuning method. In this example, they are randomly drawn configurations.

instance = tune(
  method = tnr("random_search", batch_size = 5),
  task = tsk("spam"),
  learner = learner,
  resampling = rsmp ("holdout"),
  measures = msr("classif.ce"),
  term_evals = 51,
  evaluate_default = TRUE
)

The default configuration is recorded in the first row of the archive. The other rows contain the results of the random search.

as.data.table(instance$archive)[, .(batch_nr, mtry.ratio, replace, sample.fraction, num.trees, classif.ce)]
    batch_nr mtry.ratio replace sample.fraction num.trees classif.ce
 1:        1 0.12280702    TRUE       1.0000000       500 0.04889179
 2:        2 0.81757154   FALSE       0.8117389      1528 0.06518905
 3:        2 0.90097848   FALSE       0.9188504       571 0.06975228
 4:        2 0.65584252    TRUE       0.3145144       681 0.06323338
 5:        2 0.40363652   FALSE       0.7508936      1807 0.05801825
---                                                                 
47:       11 0.71528316    TRUE       0.4398745      1394 0.06127771
48:       11 0.19136788   FALSE       0.8293552       249 0.04889179
49:       11 0.09430346   FALSE       0.6233559      1307 0.04889179
50:       11 0.52643368   FALSE       0.5993606      1403 0.05997392
51:       11 0.17115160    TRUE       0.3309041       114 0.05867014

We plot the performances of the evaluated hyperparameter configurations. The blue line connects the best configuration of each batch. We see that the default configuration already performs well and the optimized configurations can not beat it.

library(mlr3viz)

autoplot(instance, type = "performance")

Conlcusion

The time required to test the default configuration is negligible compared to the time required to run the hyperparameter optimization. It gives us a valuable indication of whether our tuning is properly configured. Running the default configuration as a baseline is a good practice that should be used in every tuning run.

References

Bischl, Bernd, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, et al. 2021. “Hyperparameter Optimization: Foundations, Algorithms, Best Practices and Open Challenges.” arXiv:2107.05847 [Cs, Stat], July. http://arxiv.org/abs/2107.05847.
Probst, Philipp, Anne-Laure Boulesteix, and Bernd Bischl. 2019. “Tunability: Importance of Hyperparameters of Machine Learning Algorithms.” Journal of Machine Learning Research 20 (53): 1–32. http://jmlr.org/papers/v20/18-444.html.
To leave a comment for the author, please follow the link and comment on their blog: mlr-org.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)