Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Yesterday, I did mention a popular graph discussed when studying theoretical foundations of statistical learning. But there is usually another one, which is the following,
for some loss function
From the law of large numbers,
when the
It is difficult to say something about the limit, since the
But if we look at the empirical risk on a validation sample
One can prove that, with probability
which depends on
I won’t spend hours on that dimension, but the idea is that this dimension is related to the model complexity. For instance, in dimension one (one covariate), if
Let us try to get a graph which looks like the one above, using the same idea as the one in our previous post.
MissClassU=rep(NA,25) MissClassV=rep(NA,25) n=200 U=data.frame(X1=runif(n),X2=runif(n)) p=(U[,1]+U[,2])/2 U$Y=rbinom(n,size=1,prob=p) V=data.frame(X1=runif(n),X2=runif(n)) p=(V[,1]+V[,2])/2 V$Y=rbinom(n,size=1,prob=p) for(s in 1:25){ reg=glm(Y~poly(X1,s)+poly(X2,s),data=U, family=binomial) pd=function(x1,x2) predict(reg,newdata=data.frame(X1=x1,X2=x2),type="response")>.5 MissClassU[s]=mean(abs(pd(U$X1,U$X2)-U$Y)) MissClassV[s]=mean(abs(pd(V$X1,V$X2)-V$Y)) }
If we plot the missclassification rate, as a function of the polynomial degree, in purple on the validation sample, and in black on the training sample, we get
Again, it is on one sample, only. We can run it on hundreds, and see how the average risk of misclassification changes with complexity.
MCU=rep(NA,500) MCV=rep(NA,500) missclassification=function(s){ for(i in 1:500){ U=data.frame(X1=runif(n),X2=runif(n)) p=(U[,1]+U[,2])/2 U$Y=rbinom(n,size=1,prob=p) reg=glm(Y~bs(X1,s)+bs(X2,s),data=U, family=binomial) pd=function(x1,x2) predict(reg,newdata=data.frame(X1=x1,X2=x2),type="response")>.5 MCU[i]=mean(abs(pd(U$X1,U$X2)-U$Y)) V=data.frame(X1=runif(n),X2=runif(n)) p=(V[,1]+V[,2])/2 V$Y=rbinom(n,size=1,prob=p) MCV[i]=mean(abs(pd(V$X1,V$X2)-V$Y)) } MissClassV=mean(MCU) MissClassU=mean(MCV) return(c(MissClassU,MissClassV)) }
Here, we cannot see the optimal dimension, because our risk on the validation samples keeps increasing. Which makes sence since our data are generated from a linear model, so the optimal transformation should be optained with linear transformation (and not polynomials with higher degrees).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.