Big News: Porting vtreat to Python
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We at Win-Vector LLC have some big news.
We are finally porting a streamlined version of our R vtreat variable preparation package to Python.
vtreat is a great system for preparing messy data for suprevised machine learning.
The new implementation is based on Pandas, and we are experimenting with pushing the sklearn.pipeline.Pipeline APIs to their limit. In particular we have found the .fit_transform()
pattern is a great way to express building up a cross-frame to avoid nested model bias (in this case .fit_transform() != .fit().transform()
). There is a bit of difference in how object oriented APIs compose versus how functional APIs compose. We are making an effort to research how to make this an advantage, and not a liability.
The new repository is here and the first example regression is here. Next up is classification (likely natively multinomial this time). After that a few validation suites to prove the two vtreat systems work similarly. And then we have some exciting new capabilities.
The first application is going to be a shortening and streamlining of our current 4 day data science in Python course (while allowing more concrete examples!).
This also means data scientists who use both R and Python will have a few more tools that present similarly in each language.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.