[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
There is a new version of the R package vtreat now up on CRAN.
vtreat is an essential data preparation system for predictive modeling that helps defend your predictive modeling work against real world data issues including:
- High cardinality categorical variables
- Rare levels (including new or novel levels during application) in categorical variables
- Missing data (random or systematic)
- Irrelevant variables/columns
- Nested model bias, and other over-fit issues.
vtreat also includes excellent, citable, documentation: vtreat: a data.frame Processor for Predictive Modeling.
For this release I want to thank everybody who generously donated their time to submit an issue or build a git pull-request. In particular:
- Vadim Khotilovich, who found and fixed a major performance problem in the y-stratified sampling.
- Lawrence Wu, who has been donating documentation fixes.
- Peter Hurford, who has been donating documentation fixes.
To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.