Machine Learning Ex 5.1 – Regularized Linear Regression

[This article was first published on YGC » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The first part of the Exercise 5.1 requires to implement a regularized version of linear regression.

Adding regularization parameter can prevent the problem of over-fitting when fitting a high-order polynomial.

Plot the data:

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
x <- read.table("ex5Linx.dat")
y <- read.table("ex5Liny.dat")
 
x <- x[,1]
y <- y[,1]
 
require(ggplot2)
d <- data.frame(x=x,y=y)
p <- ggplot(d, aes(x,y)) + geom_point(colour="red", size=3)


I will fit a 5th order polynomial, the hypothesis is:
\( h_\theta(x) = \theta_0 x_0 + \theta_1 x_1 + \theta_2 x_2^2 + \theta_3 x_3^3 + \theta_4 x_4^4 + \theta_5 x_5^5 \)

The idea of regularization is to impose Occam's razor on the solution, by scaling down the \( \theta \) which will lead to the tiny contribution of the higher order features.

For that, the cost function was defined as:
\( J(\theta) = \frac{1}{2m} [\sum_{i=1}^m ((h_\theta(x^{(i)}) - y^{(i)})^2) + \lambda \sum_{i=1}^n \theta^2] \)

Gradient Descent with regularization parameters:

Firstly, I implemented a gradient descent algorithm to find the theta.

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
mapFeature <- function(x, degree=5) {
    return(sapply(0:degree, function(i) x^i))
}
 
## hypothesis function
h <- function(theta, x) {
    #sapply(1:m, function(i) theta %*% x[i,])
    toReturn <- x %*% t(theta)
    return(toReturn)
}
 
## cost function
J <- function(theta, x, y, lambda=1) {
    m <- length(y)
    r <- theta^2
    r[1] <- 0
    j <- 1/(2*m) * sum((h(theta, x)-y)^2) + lambda*sum(r)
    return(j)
}
 
gradDescent <- function(theta, x, y, alpha=0.1, niter=1000, lambda=1) {
    m <- length(y)
    for (i in 1:niter) {
        tt <- theta
        tt[1] <- 0
        dj <- 1/m * (t(h(theta,x)-y) %*% x + lambda * tt)
        theta <- theta - alpha * dj
    }
    return(theta)
}

fitting the data with gradient descent algorithm:

?View Code RSPLUS
1
2
3
4
5
6
7
8
x <- mapFeature(x)
theta <- matrix(rep(0,6), nrow=1)
theta <- gradDescent(theta, x, y)
 
x.test <- seq(-1,1, 0.001)
y.test <- mapFeature(x.test) %*% t(theta)
 
p+geom_line(aes(x=x.test, y=y.test), colour="blue")

As shown above, the fitting model fits the data well.

Normal Equation with regularization parameters:
The Exercise requires implementing Normal Equation with the regularization parameters added.
that is:
\( \theta = (X^T X + \lambda \begin{bmatrix} 0 & & & \\ & 1 & & \\ & & ? & \\ & & & 1 \end{bmatrix} )^{-1} (X^T y) \)

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
## normal equations
normEq <- function(x,y, lambda) {
    n <- ncol(x)
    ## extra regularizatin terms
    r <- lambda * diag(n)
    r[1,1] <- 0
    theta <- solve(t(x) %*% x + r) %*% t(x) %*% y
    return(theta)
}

I try 3 different lambda values to see how it influences the fit.

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
lambda <- c(0,1,10)
theta <- sapply(lambda, normEq, x=x, y=y)
x.test <- seq(-1,1, 0.001)
yy <- sapply(1:3, function(i) mapFeature(x.test) %*% theta[,i])
yy <- melt(yy)
yy[,1] <- rep(x.test, 3)
colnames(yy) <- c("X", expression(lambda), "Y")
yy$lambda=factor(yy$lambda, labels=unique(lambda))
p+geom_line(data=yy,aes(X,Y, group=lambda, colour=lambda))


With lambda=0, the fit is very tight to the original points (the red line), and of course it is over-fitting.

As lambda increase, the model gets less tight and more generalized, and therefore preventing over-fitting.

This figure can also lead to a conclusion, that when lambda is too large, the model will under-fitting.

Reference:
Machine Learning Course
Exercise 5

Related Posts

To leave a comment for the author, please follow the link and comment on their blog: YGC » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)