Re-Share: vtreat Data Preparation Documentation and Video
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I would like to re-share vtreat (R version, Python version) a data preparation documentation for machine learning tasks.
vtreat is a system for preparing messy real world data for predictive modeling tasks (classification, regression, and so on). In particular it is very good at re-coding high-cardinality string-valued (or categorical) variables for later use.
A nice introductory video lecture on vtreat can be found here, and the latest copy of the lecture slides here. Or, you can check out chapter 8 “Advanced data preparation” of Zumel, Mount, Practical Data Science with R, 2nd Edition, Manning 2019– which covers the use of vtreat.
The vtreat documentation is organized by task (regression, classification, multinomial classification, and unsupervised), language (R or Python) and interface style (design/prepare, or fit/prepare). In particular the R code now supports variations of the interfaces, allowing users to choose what works best with their coding style. Either design/prepare, which is very fluid when combined with wrapr::unpack notation or the fit/prepare (which uses mutable state to organize steps).
- Regression:
Python
regression example,R
regression example, fit/prepare interface,R
regression example, design/prepare/experiment interface. - Classification:
Python
classification example,R
classification example, fit/prepare interface,R
classification example, design/prepare/experiment interface. - Unsupervised tasks:
Python
unsupervised example,R
unsupervised example, fit/prepare interface,R
unsupervised example, design/prepare/experiment interface. - Multinomial classification:
Python
multinomial classification example,R
multinomial classification example, fit/prepare interface,R
multinomial classification example, design/prepare/experiment interface.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.