Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We here at Win-Vector LLC have some really big news we would please like the R
-community’s help sharing.
vtreat
version 1.2.0 is now available on CRAN, and this version of vtreat
can now implement its data cleaning and preparation steps on databases and big data systems such as Apache Spark
.
vtreat
is a very complete and rigorous tool for preparing messy real world data for supervised machine-learning tasks. It implements a technique we call “safe y-aware processing” using cross-validation or stacking techniques. It is very easy to use: you show it some data and it designs a data transform for you.
Thanks to the rquery
package, this data preparation transform can now be directly applied to databases, or big data systems such as PostgreSQL
, Amazon RedShift
, Apache Spark
, or Google BigQuery
. Or, thanks to the data.table
and rqdatatable
packages, even fast large in-memory transforms are possible.
We have some basic examples of the new vtreat
capabilities here and here.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.