More neurons in the hidden layer than predictive features in neural nets

[This article was first published on R-english – Freakonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This week, we were talking about neural networks for the first time, and I was saying that, in many illustrations of neural networks, there was a layer with fewer neurons than predictive variables,

but sometimes, it could make sense to have more neurons in the layer than predictive variables,

To illustrate, consider a simple example with one single variable \(x\), and a binary outcome \(y\in\{0,1\}\)

set.seed(12345) n = 100 x = c(runif(n),1+runif(n),2+runif(n)) y = rep(c(0,1,0),each=n)

We should insure that observations are in the \([0,1]\) interval,

minmax = function(z) (z-min(z))/(max(z)-min(z)) xm = minmax(x) df = data.frame(x=xm,y=y)

just like what we can visualize below

plot(df$x,rep(0,3*n),col=1+df$y)

Here, the blue and the red dots (when \(y\) is either 0 or 1) are not linearly separable. The standard activation function in neural nets is the sigmoid

sigmoid = function(x) 1 / (1 + exp(-x))

Let us compute a neural network

library(nnet) set.seed(1234) model_nnet = nnet(y~x,size=2,data=df)

We can then get the weights, and we can visualize the two neurons

w = neuralweights(model_nnet) x1 = cbind(1,df$x)%*%w$wts$"hidden 1 1" x2 = cbind(1,df$x)%*%w$wts$"hidden 1 2" b = w$wts$`out 1` plot(sigmoid(x1),sigmoid(x2),col=1+df$y)

 

Now, the the blue and the red dots (when \(y\) is either 0 or 1) are actually linearly separable.

abline(a=-b[1]/b[3],b=-b[2]/b[3])

If we do not specify the seed of the random generator, we can get a different outcome since, obviously, this model is not identifiable

or

If we now have

set.seed(12345) n=100 x=c(runif(n),1+runif(n),2+runif(n),3+runif(n)) y=rep(c(0,1,0,1),each=n) xm = minmax(x) df = data.frame(x=xm,y=y) plot(df$x,rep(0,4*n),col=1+df$y)

Now we need more neurons

set.seed(321) model_nnet = nnet(y~x,size=3,data=df) w = neuralweights(model_nnet) x1 = cbind(1,df$x)%*%w$wts$"hidden 1 1" x2 = cbind(1,df$x)%*%w$wts$"hidden 1 2" x3 = cbind(1,df$x)%*%w$wts$"hidden 1 3" b = w$wts$`out 1` library(scatterplot3d) s3d = scatterplot3d(x=sigmoid(x1), y=sigmoid(x2), z=sigmoid(x3),color=1+df$y)

but one more time, we have been able to separate (linearly) the blue and the red points

Finally, consider

set.seed(123) n=500 x1=runif(n)*3-1.5 x2=runif(n)*3-1.5 y = (x1^2+x2^2)<=1 x1m = minmax(x1) x2m = minmax(x2) df = data.frame(x1=x1m,x2=x2m,y=y) plot(df$x1,df$x2,col=1+df$y)

and again, we three neurons (for two explanatory variables) we can, again, linearly, separate the blue and the red points

set.seed(1234) model_nnet = nnet(y~x1+x2,size=3,data=df) w = neuralweights(model_nnet) x1 = cbind(1,df$x1,df$x2)%*%w$wts$"hidden 1 1" x2 = cbind(1,df$x1,df$x2)%*%w$wts$"hidden 1 2" x3 = cbind(1,df$x1,df$x2)%*%w$wts$"hidden 1 3" b = w$wts$`out 1` library(scatterplot3d) s3d <- scatterplot3d(x=sigmoid(x1), y=sigmoid(x2), z=sigmoid(x3), color=1+df$y)

Here, neural networks play the rule of the kernel trick, as coined in Koutroumbas, K. & Theodoridis, S. (2008). Pattern Recognition. Academic Press

To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)