Machine Learning Ex 5.1 – Regularized Linear Regression

[This article was first published on YGC » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The first part of the Exercise 5.1 requires to implement a regularized version of linear regression.

Adding regularization parameter can prevent the problem of over-fitting when fitting a high-order polynomial.

Plot the data:

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
x <- read.table("ex5Linx.dat")
y <- read.table("ex5Liny.dat")
 
x <- x[,1]
y <- y[,1]
 
require(ggplot2)
d <- data.frame(x=x,y=y)
p <- ggplot(d, aes(x,y)) + geom_point(colour="red", size=3)


I will fit a 5th order polynomial, the hypothesis is:
hθ(x)=θ0x0+θ1x1+θ2x22+θ3x33+θ4x44+θ5x55

The idea of regularization is to impose Occam's razor on the solution, by scaling down the θ which will lead to the tiny contribution of the higher order features.

For that, the cost function was defined as:
J(θ)=12m[mi=1((hθ(x(i))y(i))2)+λni=1θ2]

Gradient Descent with regularization parameters:

Firstly, I implemented a gradient descent algorithm to find the theta.

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
mapFeature <- function(x, degree=5) {
    return(sapply(0:degree, function(i) x^i))
}
 
## hypothesis function
h <- function(theta, x) {
    #sapply(1:m, function(i) theta %*% x[i,])
    toReturn <- x %*% t(theta)
    return(toReturn)
}
 
## cost function
J <- function(theta, x, y, lambda=1) {
    m <- length(y)
    r <- theta^2
    r[1] <- 0
    j <- 1/(2*m) * sum((h(theta, x)-y)^2) + lambda*sum(r)
    return(j)
}
 
gradDescent <- function(theta, x, y, alpha=0.1, niter=1000, lambda=1) {
    m <- length(y)
    for (i in 1:niter) {
        tt <- theta
        tt[1] <- 0
        dj <- 1/m * (t(h(theta,x)-y) %*% x + lambda * tt)
        theta <- theta - alpha * dj
    }
    return(theta)
}

fitting the data with gradient descent algorithm:

?View Code RSPLUS
1
2
3
4
5
6
7
8
x <- mapFeature(x)
theta <- matrix(rep(0,6), nrow=1)
theta <- gradDescent(theta, x, y)
 
x.test <- seq(-1,1, 0.001)
y.test <- mapFeature(x.test) %*% t(theta)
 
p+geom_line(aes(x=x.test, y=y.test), colour="blue")

As shown above, the fitting model fits the data well.

Normal Equation with regularization parameters:
The Exercise requires implementing Normal Equation with the regularization parameters added.
that is:
θ=(XTX+λ[01?1])1(XTy)

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
## normal equations
normEq <- function(x,y, lambda) {
    n <- ncol(x)
    ## extra regularizatin terms
    r <- lambda * diag(n)
    r[1,1] <- 0
    theta <- solve(t(x) %*% x + r) %*% t(x) %*% y
    return(theta)
}

I try 3 different lambda values to see how it influences the fit.

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
lambda <- c(0,1,10)
theta <- sapply(lambda, normEq, x=x, y=y)
x.test <- seq(-1,1, 0.001)
yy <- sapply(1:3, function(i) mapFeature(x.test) %*% theta[,i])
yy <- melt(yy)
yy[,1] <- rep(x.test, 3)
colnames(yy) <- c("X", expression(lambda), "Y")
yy$lambda=factor(yy$lambda, labels=unique(lambda))
p+geom_line(data=yy,aes(X,Y, group=lambda, colour=lambda))


With lambda=0, the fit is very tight to the original points (the red line), and of course it is over-fitting.

As lambda increase, the model gets less tight and more generalized, and therefore preventing over-fitting.

This figure can also lead to a conclusion, that when lambda is too large, the model will under-fitting.

Reference:
Machine Learning Course
Exercise 5

Related Posts

To leave a comment for the author, please follow the link and comment on their blog: YGC » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)