Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Consider here the dataset used in a previous post, about visualising a classification (with more than 2 features),
> MYOCARDE=read.table( + "http://freakonometrics.free.fr/saporta.csv", + header=TRUE,sep=";")
The default classification tree is
> arbre = rpart(factor(PRONO)~.,data=MYOCARDE) > rpart.plot(arbre,type=4,extra=6)
We can change the options here, such as the minimum number of observations, per node
> arbre = rpart(factor(PRONO)~.,data=MYOCARDE, + control=rpart.control(minsplit=10)) > rpart.plot(arbre,type=4,extra=6)
or
> arbre = rpart(factor(PRONO)~.,data=MYOCARDE, + control=rpart.control(minsplit=5)) > rpart.plot(arbre,type=4,extra=6)
To visualize that classification, use the following code (to get a projection on the first two components)
> library(FactoMineR) # ACP (sur les var continues) > X=MYOCARDE[,1:7] > acp=PCA(X,ncp=ncol(X)) > M=acp$var$coord > m=apply(X,2,mean) > s=apply(X,2,sd) > > arbre = rpart(factor(PRONO)~.,data=MYOCARDE) > pred2=function(d1,d2,Mat,tree){ + z=Mat %*% c(d1,d2,rep(0,ncol(X)-2)) + newd=data.frame(t(z*s+m)) + names(newd)=names(X) + predict(tree,newdata=newd, + type="prob")[2] } > p=function(d1,d2) pred2(d1,d2,Minv,arbre) > Outer <- function(x,y,fun) { + mat <- matrix(NA, length(x), length(y)) + for (i in seq_along(x)) { + for (j in seq_along(y)) + mat[i,j]=fun(x[i],y[j])} + return(mat)} > xgrid=seq(-5,5,length=251) > ygrid=seq(-5,5,length=251) > zgrid=Outer(xgrid,ygrid,p) > bluereds=c( + rgb(1,0,0,(10:0)/25),rgb(0,0,1,(0:10)/25)) > acp2=PCA(MYOCARDE,quali.sup=8,graph=TRUE) > plot(acp2, habillage = 8,col.hab=c("red","blue")) > image(xgrid,ygrid,zgrid,add=TRUE,col=bluereds) > contour(xgrid,ygrid,zgrid,add=TRUE,levels=.5)
It is also possible to consider the case where
> arbre = rpart(factor(PRONO)~.,data=MYOCARDE, + control=rpart.control(minsplit=5))
Finaly, one can also grow more trees, obtained by sampling. This is the idea of bagging: we boostrap our observations, we grow some trees, and then, we aggregate the predicted values. On the grid
> xgrid=seq(-5,5,length=201) > ygrid=seq(-5,5,length=201)
the code is the following,
> Z = matrix(0,201,201) > for(i in 1:200){ + indice = sample(1:nrow(MYOCARDE), + size=nrow(MYOCARDE), + replace=TRUE) + ECHANTILLON=MYOCARDE[indice,] + arbre_b = rpart(factor(PRONO)~., + data=ECHANTILLON) + p2 = function(d1,d2) pred2(d1,d2, Minv,arbre_b) + zgrid2_b = Outer(xgrid,ygrid,p2) + Z = Z+zgrid2_b } > Zgrid = Z/200
To visualize it, use
> plot(acp2, habillage = 8, + col.hab=c("red","blue")) > image(xgrid,ygrid,Zgrid,add=TRUE, + col=bluereds)
> contour(xgrid,ygrid,Zgrid,add=TRUE, + levels=.5,lwd=3)
Last, but not least, it is possible to use some random forrest algorithm. The method combines Breiman’s bagging idea (mentioned previously) and the random selection of features.
> library(randomForest) > foret = randomForest(factor(PRONO)~., + data=MYOCARDE) > pF=function(d1,d2) pred2(d1,d2,Minv,foret) > zgridF=Outer(xgrid,ygrid,pF) > acp2=PCA(MYOCARDE,quali.sup=8,graph=TRUE) > plot(acp2, habillage = 8,col.hab=c("red","blue")) > image(xgrid,ygrid,Zgrid,add=TRUE, + col=bluereds) > contour(xgrid,ygrid,zgridF, + add=TRUE,levels=.5,lwd=3)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.