Machine Learning Ex 5.1 – Regularized Linear Regression
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The first part of the Exercise 5.1 requires to implement a regularized version of linear regression.
Adding regularization parameter can prevent the problem of over-fitting when fitting a high-order polynomial.
Plot the data:
1 2 3 4 5 6 7 8 9 | x <- read.table("ex5Linx.dat") y <- read.table("ex5Liny.dat") x <- x[,1] y <- y[,1] require(ggplot2) d <- data.frame(x=x,y=y) p <- ggplot(d, aes(x,y)) + geom_point(colour="red", size=3) |
I will fit a 5th order polynomial, the hypothesis is:
hθ(x)=θ0x0+θ1x1+θ2x22+θ3x33+θ4x44+θ5x55
The idea of regularization is to impose Occam's razor on the solution, by scaling down the θ which will lead to the tiny contribution of the higher order features.
For that, the cost function was defined as:
J(θ)=12m[∑mi=1((hθ(x(i))−y(i))2)+λ∑ni=1θ2]
Gradient Descent with regularization parameters:
Firstly, I implemented a gradient descent algorithm to find the theta.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | mapFeature <- function(x, degree=5) { return(sapply(0:degree, function(i) x^i)) } ## hypothesis function h <- function(theta, x) { #sapply(1:m, function(i) theta %*% x[i,]) toReturn <- x %*% t(theta) return(toReturn) } ## cost function J <- function(theta, x, y, lambda=1) { m <- length(y) r <- theta^2 r[1] <- 0 j <- 1/(2*m) * sum((h(theta, x)-y)^2) + lambda*sum(r) return(j) } gradDescent <- function(theta, x, y, alpha=0.1, niter=1000, lambda=1) { m <- length(y) for (i in 1:niter) { tt <- theta tt[1] <- 0 dj <- 1/m * (t(h(theta,x)-y) %*% x + lambda * tt) theta <- theta - alpha * dj } return(theta) } |
fitting the data with gradient descent algorithm:
1 2 3 4 5 6 7 8 | x <- mapFeature(x) theta <- matrix(rep(0,6), nrow=1) theta <- gradDescent(theta, x, y) x.test <- seq(-1,1, 0.001) y.test <- mapFeature(x.test) %*% t(theta) p+geom_line(aes(x=x.test, y=y.test), colour="blue") |
As shown above, the fitting model fits the data well.
Normal Equation with regularization parameters:
The Exercise requires implementing Normal Equation with the regularization parameters added.
that is:
θ=(XTX+λ[01?1])−1(XTy)
1 2 3 4 5 6 7 8 9 | ## normal equations normEq <- function(x,y, lambda) { n <- ncol(x) ## extra regularizatin terms r <- lambda * diag(n) r[1,1] <- 0 theta <- solve(t(x) %*% x + r) %*% t(x) %*% y return(theta) } |
I try 3 different lambda values to see how it influences the fit.
1 2 3 4 5 6 7 8 9 | lambda <- c(0,1,10) theta <- sapply(lambda, normEq, x=x, y=y) x.test <- seq(-1,1, 0.001) yy <- sapply(1:3, function(i) mapFeature(x.test) %*% theta[,i]) yy <- melt(yy) yy[,1] <- rep(x.test, 3) colnames(yy) <- c("X", expression(lambda), "Y") yy$lambda=factor(yy$lambda, labels=unique(lambda)) p+geom_line(data=yy,aes(X,Y, group=lambda, colour=lambda)) |
With lambda=0, the fit is very tight to the original points (the red line), and of course it is over-fitting.
As lambda increase, the model gets less tight and more generalized, and therefore preventing over-fitting.
This figure can also lead to a conclusion, that when lambda is too large, the model will under-fitting.
Reference:
Machine Learning Course
Exercise 5
Related Posts
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.