Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
New Features
As promised, we kept on working on our bounceR
package. For once, we changed the interface: users now do not have to choose a number of tuning parameters, that – thanks to my somewhat cryptic documentation – sound more complicated than they are. Inspired by H2o.ai feature to let the user set the time he or she wants to wait, instead of a number of cryptic tuning parameters, we added a similar function.
Further, we changed the source code quite a bit. Henrik Bengtsson gave a very inspiring talk on parallization using the genius future
package at this year's eRum conference. A couple days later, Davis Vaughan released furrr
. An incredibly smart – kudos – wrapper on-top of the no-less genius purrr
package. Davis' package combines purrr
's maping functions with future
's parallization madness. As you can tell, I am a big fan of all these three packages. Thus, inspired by these new inventions, we wanted to make use of them in our package. So, the entire parallization setup of bounceR
is now leveraging furrr
. This way, the parallization is so much smarter, faster and works seemingless on different operating systems.
Practical Example
Thus, lets see how you can use it now. Let's start by downloading the package.
# you need devtools, cause we are just about to get it to CRAN, though we are not that far library(devtools) # now you are good to go devtools::install_github("STATWORX/bounceR") # now you can source it like every normal package library(bounceR)
To show how the feature selection works, we now need some data, so lets simulate some with our sim_data()
function.
# simulate some data data <- sim_data(n = 100, modelvars = 10, noisevars = 300)
Now you guys can all imagine that with 310 features on 100 observations, building models could be a little challenging. In order to be able to model the target no less, you need to reduce your feature space. There are numerous ways to do so. In my last Blog Post I described our solution. Let's see how to use our algorithm.
# run our algorithm selected_features <- featureSelection(data = data, target = "y", max_time = "30 mins", bootstrap = "regular", early_stopping = TRUE, parallel = TRUE)
What can you expect to get out of it? Well, we return a list with of course the optimal formula calculated by our algorithm. Further, you get a stability matrix with it, where you can see a ranking of the features by importance. Additionally we built in some convenient S4 methods, so you can easily access all the information you need.
Outlook
I hope I could teaser you a little to check out the package and help us further improve it. Currently, we are developing two new algorithms for feature selection. Thus, in the next iteration we will implement those two as well. I am looking forward to your comments, issues and thoughts on the package.
Cheers Guys!
Über den Autor
Lukas Strömsdörfer
Der Beitrag bounceR 0.1.2: Automated Feature Selection erschien zuerst auf STATWORX.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.