Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Big thanks to the main architect: Zygmunt Zawadzki, zstat, and our reviewer: Krzysztof Słomczyński.
If something is missing or not clear – please chat with us on our slack?
Get started: Motivation, Installation and Quick Workflow
Provided functionalities
Blog posts history with use cases
- Entropy Based Image Binarization with imager and FSelectorRcpp, Marcin Kosiński
- Venn Diagram Comparison of Boruta, FSelectorRcpp and GLMnet Algorithms, Marcin Kosiński
Quick Workflow
A simple entropy based feature selection workflow. Information gain is an easy, linear algorithm that computes the entropy of a dependent and explanatory variables, and the conditional entropy of a dependent variable with a respect to each explanatory variable separately. This simple statistic enables to calculate the belief of the distribution of a dependent variable when we only know the distribution of a explanatory variable.
# install.packages(c('magrittr', 'FSelectorRcpp')) library(magrittr) library(FSelectorRcpp) information_gain( # Calculate the score for each attribute formula = Species ~ ., # that is on the right side of the formula. data = iris, # Attributes must exist in the passed data. type = "infogain", # Choose the type of a score to be calculated. threads = 2 # Set number of threads in a parallel backend. ) %>% cut_attrs( # Then take attributes with the highest rank. k = 2 # For example: 2 attrs with the higehst rank. ) %>% to_formula( # Create a new formula object with attrs = ., # the most influencial attrs. class = "Species" ) %>% glm( formula = ., # Use that formula in any classification algorithm. data = iris, family = "binomial" )
Acknowledgements
The cover photo of this blog posts comes from https://newevolutiondesigns.com/20-fire-art-wallpapers
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.