vtreat data cleaning and preparation article now available on arXiv
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Nina Zumel and I are happy to announce a formal article discussing data preparation and cleaning using the vtreat methodology is now available from arXiv.org as citation arXiv:1611.09477 [stat.AP].
vtreat
is an R data.frame
processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. It prepares variables so that data has fewer exceptional cases, making it easier to safely use models in production. Common problems vtreat
defends against include: infinity
, NA
, too many categorical levels, rare categorical levels, and new categorical levels (levels seen during application, but not during training). vtreat::prepare
should be your first choice for real world data preparation and cleaning.
We hope this article will make getting started with vtreat
much easier. We also hope this helps with citing the use of vtreat
in scientific publications.
You can cite the current article as:
@misc{vtreatarticle,
title = {vtreat: a data.frame Processor for Predictive Modeling},
author = {Nina Zumel and John Mount},
year = {2016},
month = {November},
journal = {arXiv},
date = {2016-11-29},
howpublished = {arXiv:1611.09477 [stat.AP] \url{https://arxiv.org/abs/1611.09477}},
url = {https://arxiv.org/abs/1611.09477},
urldate = {2016-11-29},
eprinttype = {arxiv},
pages = {1--40},
eprint = {arXiv:1611.09477 [stat.AP]}
}
Zumel, N. and Mount, J. (2016). vtreat: a data.frame processor for predictive modeling. arXiv:1611.09477 [stat.AP] https://arxiv.org/abs/1611.09477.
And you can cite the vtreat
package as:
@misc{vtreatpackage,
title = {vtreat: A Statistically Sound data.frame Processor/Conditioner},
author = {John Mount and Nina Zumel},
year = {2016},
note = {R package version 0.5.28},
howpublished = {\url{https://CRAN.R-project.org/package=vtreat}},
url = {https://CRAN.R-project.org/package=vtreat}
}
Mount, J. and Zumel, N. (2016). vtreat: A statistically sound data.frame processor/conditioner. https://CRAN.R-project.org/package=vtreat. R package version 0.5.28.
For more articles on vtreat
please try here or here.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.