More neurons in the hidden layer than predictive features in neural nets
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This week, we were talking about neural networks for the first time, and I was saying that, in many illustrations of neural networks, there was a layer with fewer neurons than predictive variables,
but sometimes, it could make sense to have more neurons in the layer than predictive variables,
To illustrate, consider a simple example with one single variable \(x\), and a binary outcome \(y\in\{0,1\}\)
set.seed(12345)
n = 100
x = c(runif(n),1+runif(n),2+runif(n))
y = rep(c(0,1,0),each=n)
We should insure that observations are in the \([0,1]\) interval,
minmax = function(z) (z-min(z))/(max(z)-min(z))
xm = minmax(x)
df = data.frame(x=xm,y=y)
just like what we can visualize below
plot(df$x,rep(0,3*n),col=1+df$y)
Here, the blue and the red dots (when \(y\) is either 0 or 1) are not linearly separable. The standard activation function in neural nets is the sigmoid
sigmoid = function(x) 1 / (1 + exp(-x))
Let us compute a neural network
library(nnet)
set.seed(1234)
model_nnet = nnet(y~x,size=2,data=df)
We can then get the weights, and we can visualize the two neurons
w = neuralweights(model_nnet)
x1 = cbind(1,df$x)%*%w$wts$"hidden 1 1"
x2 = cbind(1,df$x)%*%w$wts$"hidden 1 2"
b = w$wts$`out 1`
plot(sigmoid(x1),sigmoid(x2),col=1+df$y)
Now, the the blue and the red dots (when \(y\) is either 0 or 1) are actually linearly separable.
abline(a=-b[1]/b[3],b=-b[2]/b[3])
If we do not specify the seed of the random generator, we can get a different outcome since, obviously, this model is not identifiable
or
If we now have
set.seed(12345)
n=100
x=c(runif(n),1+runif(n),2+runif(n),3+runif(n))
y=rep(c(0,1,0,1),each=n)
xm = minmax(x)
df = data.frame(x=xm,y=y)
plot(df$x,rep(0,4*n),col=1+df$y)
Now we need more neurons
set.seed(321)
model_nnet = nnet(y~x,size=3,data=df)
w = neuralweights(model_nnet)
x1 = cbind(1,df$x)%*%w$wts$"hidden 1 1"
x2 = cbind(1,df$x)%*%w$wts$"hidden 1 2"
x3 = cbind(1,df$x)%*%w$wts$"hidden 1 3"
b = w$wts$`out 1`
library(scatterplot3d)
s3d = scatterplot3d(x=sigmoid(x1),
y=sigmoid(x2), z=sigmoid(x3),color=1+df$y)
but one more time, we have been able to separate (linearly) the blue and the red points
Finally, consider
set.seed(123)
n=500
x1=runif(n)*3-1.5
x2=runif(n)*3-1.5
y = (x1^2+x2^2)<=1
x1m = minmax(x1)
x2m = minmax(x2)
df = data.frame(x1=x1m,x2=x2m,y=y)
plot(df$x1,df$x2,col=1+df$y)
and again, we three neurons (for two explanatory variables) we can, again, linearly, separate the blue and the red points
set.seed(1234)
model_nnet = nnet(y~x1+x2,size=3,data=df)
w = neuralweights(model_nnet)
x1 = cbind(1,df$x1,df$x2)%*%w$wts$"hidden 1 1"
x2 = cbind(1,df$x1,df$x2)%*%w$wts$"hidden 1 2"
x3 = cbind(1,df$x1,df$x2)%*%w$wts$"hidden 1 3"
b = w$wts$`out 1`
library(scatterplot3d)
s3d <- scatterplot3d(x=sigmoid(x1), y=sigmoid(x2), z=sigmoid(x3),
color=1+df$y)
Here, neural networks play the rule of the kernel trick, as coined in Koutroumbas, K. & Theodoridis, S. (2008). Pattern Recognition. Academic Press
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.