Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
You can also watch this video which goes through the following example step-by-step: Quick Start Guide for the OneR package (Video)
After installing the OneR package from CRAN load it
library(OneR)
Use the famous Iris dataset and determine optimal bins for numeric data
data <- optbin(iris)
Build model with best predictor
model <- OneR(data, verbose = TRUE) ## ## Attribute Accuracy ## 1 * Petal.Width 96% ## 2 Petal.Length 95.33% ## 3 Sepal.Length 74.67% ## 4 Sepal.Width 55.33% ## --- ## Chosen attribute due to accuracy ## and ties method (if applicable): '*'
Show learned rules and model diagnostics
summary(model) ## ## Call: ## OneR(data = data, verbose = TRUE) ## ## Rules: ## If Petal.Width = (0.0976,0.791] then Species = setosa ## If Petal.Width = (0.791,1.63] then Species = versicolor ## If Petal.Width = (1.63,2.5] then Species = virginica ## ## Accuracy: ## 144 of 150 instances classified correctly (96%) ## ## Contingency table: ## Petal.Width ## Species (0.0976,0.791] (0.791,1.63] (1.63,2.5] Sum ## setosa * 50 0 0 50 ## versicolor 0 * 48 2 50 ## virginica 0 4 * 46 50 ## Sum 50 52 48 150 ## --- ## Maximum in each column: '*' ## ## Pearson's Chi-squared test: ## X-squared = 266.35, df = 4, p-value < 2.2e-16
Plot model diagnostics
plot(model)
Use model to predict data
prediction <- predict(model, data)
Evaluate prediction statistics
eval_model(prediction, data) ## ## Confusion matrix (absolute): ## Actual ## Prediction setosa versicolor virginica Sum ## setosa 50 0 0 50 ## versicolor 0 48 4 52 ## virginica 0 2 46 48 ## Sum 50 50 50 150 ## ## Confusion matrix (relative): ## Actual ## Prediction setosa versicolor virginica Sum ## setosa 0.33 0.00 0.00 0.33 ## versicolor 0.00 0.32 0.03 0.35 ## virginica 0.00 0.01 0.31 0.32 ## Sum 0.33 0.33 0.33 1.00 ## ## Accuracy: ## 0.96 (144/150) ## ## Error rate: ## 0.04 (6/150) ## ## Error rate reduction (vs. base rate): ## 0.94 (p-value < 2.2e-16)
Please note that the very good accuracy of 96% is reached effortlessly.
“Petal.Width” is identified as the attribute with the highest predictive value. The cut points of the intervals are found automatically (via the included optbin function). The results are three very simple, yet accurate, rules to predict the respective species.
The nearly perfect separation of the areas in the diagnostic plot give a good indication of the model’s ability to separate the different species.
The whole code of this post:
library(OneR) data <- optbin(iris) model <- OneR(data, verbose = TRUE) summary(model) plot(model) prediction <- predict(model, data) eval_model(prediction, data)
More sophisticated examples will follow in upcoming posts… so stay tuned!
Help
From within R:
help(package = OneR)
…or as a pdf here: OneR.pdf
The package vignette: OneR – Establishing a New Baseline for Machine Learning Classification Models
Issues can be posted here: https://github.com/vonjd/OneR/issues
Feedback
I would love to hear about your experiences with the OneR package. Please drop a line or two in the comments – Thank you!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.