Site icon R-bloggers

vtreat 0.5.27 released on CRAN

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Win-Vector LLC, Nina Zumel and I are pleased to announce that ‘vtreat’ version 0.5.27 has been released on CRAN.

vtreat is a data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner.

(from the package documentation)

Very roughly vtreat accepts an arbitrary “from the wild” data frame (with different column types, NAs, NaNs and so forth) and returns a transformation that reliably and repeatably converts similar data frames to numeric (matrix-like) frames (all independent variables numeric free of NA, NaNs, infinities, and so on) ready for predictive modeling. This is a systematic way to work with high-cardinality character and factor variables (which are incompatible with some machine learning implementations such as random forest, and also bring in a danger of statistical over-fitting) and leaves the analyst more time to incorporate domain specific data preparation (as vtreat tries to handle as much of the common stuff as practical). For more of an overall description please see here.

We suggest any users please update (and you will want to re-run any “design” steps instead of mixing “design” and “prepare” from two different versions of vtreat).

For what is new in version 0.5.27 please read on.

vtreat 0.5.27 is a maintenance release. User visible improvements include.

The idea is having data preparation as a re-usable library lets us research, document, optimize, and fine tune a lot more details than would make sense on any one analysis project. The main design difference from other data preparation packages is we emphasize “y-aware” (or outcome aware) processing (using the training outcome to generate useful re-encodings of the data).

We have pre-rendered a lot of the package documentation, examples, and tutorials here.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.