[This article was first published on Shifting sands, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I have a classification task and was reading up on various approaches. In the specific case where all inputs are categorical, one can use “Bayesian Naïve Bayes” using the Dirichlet distribution. < o:p>
Poking through the freely available text by Barber, I found a rather detailed discussion in chapters 9 and 10, as well as example matlab code for the book, so took it upon myself to port it to R as a learning exercise. < o:p>
I was not immediately familiar with the Dirichlet distribution, but in this case it appeals to the intuitive counting approach to discrete event probabilities. < o:p>
In a nutshell we use the training data to learn the posterior distribution, which turns out to be counts of how often a given event occurs, grouped by class, feature and feature state.< o:p>
Prediction is a case of counting events in the test vector. The more this count differs from the per-class trained counts, the lower the probability the current candidate class is a match. < o:p>
Anyway, there are three files. The first is a straightforward port of Barber’s code, but this wasn’t very R-like, and in particular only seemed to handle input features with the same number of states.< o:p>
I developed my own version that expects everything to be represented as factors. It is all a bit rough and ready but appears to work and there is a test/example script up here. As a bigger test I ran it on a sample car evaluation data set from here, the confusion matrix is as follows:< o:p>
testY acc good unacc vgood< o:p>
acc 83 3 29 0< o:p>
good 16 5 0 0< o:p>
unacc 17 0 346 0< o:p>
vgood 13 0 0 6< o:p>
That’s it for now. Comments/feedback appreciated. You can find me on twitter here< o:p>
Links to files:
Barber Port< o:p>
Everything in one directory (with data) here
To leave a comment for the author, please follow the link and comment on their blog: Shifting sands.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.