[This article was first published on Analytics , Education , Campus and beyond, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This is mostly for my students and myself for future reference.
Classification is a supervised task , where we need preclassified data and then on new data , I can predict.
Generally we holdout a % from the data available for testing and we call them training and testing data respectively. So it’s like this , if we know which emails are spam , then only using classification we can predict the emails as spam.
I used the dataset http://archive.ics.uci.edu/ml/datasets/seeds# . The data set has 7 real valued attributes and 1 for predicting . http://www.jeffheaton.com/2013/06/basic-classification-in-r-neural-networks-and-support-vector-machines/ has influenced many of the writing , probably I am making it more obvious.
The library to be used is library(nnet) , below are the list of commands for your reference
Happy Coding !
Classification is a supervised task , where we need preclassified data and then on new data , I can predict.
Generally we holdout a % from the data available for testing and we call them training and testing data respectively. So it’s like this , if we know which emails are spam , then only using classification we can predict the emails as spam.
I used the dataset http://archive.ics.uci.edu/ml/datasets/seeds# . The data set has 7 real valued attributes and 1 for predicting . http://www.jeffheaton.com/2013/06/basic-classification-in-r-neural-networks-and-support-vector-machines/ has influenced many of the writing , probably I am making it more obvious.
The library to be used is library(nnet) , below are the list of commands for your reference
< !--[if !supportLists]-->1. < !--[endif]-->Read from dataset< o:p>
seeds<-read.csv(‘seeds.csv’,header=T)< o:p>
< !--[if !supportLists]-->2. < !--[endif]-->Setting training set index , 210 is the dataset size, 147 is 70 % of that < o:p>
seedstrain<- sample(1:210,147)
< !--[if !supportLists]-->3. < !--[endif]-->Setting test set index< o:p>
seedstest <- setdiff(1:210,seedstrain)
< !--[if !supportLists]-->4. < !--[endif]-->Normalize the value to be predicted , use that attribute of the dataset , that you want to predict< o:p>
ideal <- class.ind(seeds$Class)
< !--[if !supportLists]-->5. < !--[endif]-->Train the model, -8 because you want to leave out the class attribute , the dataset had a total of 8 attributes with the last one as the predicted one< o:p>
seedsANN = nnet(irisdata[seedstrain,-8], ideal[seedstrain,], size=10, softmax=TRUE)
< !--[if !supportLists]-->6. < !--[endif]-->Predict on testset< o:p>
predict(seedsANN, seeds[seedstrain,-8], type="class")
< !--[if !supportLists]-->7. < !--[endif]-->Calculate Classification accuracy< o:p>
table(predict(seedsANN, seeds[seedstest,-8], type="class"),seeds[seedstest,]$Class)
Happy Coding !
To leave a comment for the author, please follow the link and comment on their blog: Analytics , Education , Campus and beyond.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.