Machine Learning Ex4 – Logistic Regression

[This article was first published on YGC » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Exercise 4 required implementing Logistic Regression using Newton’s Method.

The dataset in use is 80 students and their grades of 2 exams, 40 students were admitted to college and the other 40 students were not. We need to implement a binary classification model to estimates college admission based on the student’s scores on these two exams.

plot the data

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
x <- read.table("ex4x.dat",header=F, stringsAsFactors=F)
x <- cbind(rep(1, nrow(x)), x)
colnames(x) <- c("X0", "Exam1", "Exam2")
x <- as.matrix(x)
 
y <- read.table("ex4y.dat",header=F, stringsAsFactors=F)
y <- y[,1]
 
## plotting data
d <- data.frame(x,
                y = factor(y,
                levels=c(0,1),
                labels=c("Not admitted","Admitted" )
                )
                )
 
require(ggplot2)
p <- ggplot(d, aes(x=Exam1, y=Exam2)) +
    geom_point(aes(shape=y, colour=y)) +
    xlab("Exam 1 score") +
    ylab("Exam 2 score")


Logistic Regression


We first need to define our Hypothesis Function that return values between[0,1],suitable for binary classification.

hθ(x)=g(θTx)=11+eθTx

function g is the sigmoid function, and function h return the probability of y=1:

hθ(x)=P(y=1|x;θ)

What we need is to compute θ ,to find out the proper Hypothesis Function.

Similar to the linear regression,We defined the cost function, which estimate the error of hypothesis function fitting the sample data, at a given θ .

The cost function was defined as:
J(θ)=1mmi=1((y)log(hθ(x))(1y)log(1hθ(x)))

To determine the most suitable hypothesis function, we need to find the θ value which minimize the value of J(θ) . This can be achieved by the Newton's method, by finding the root of the derivative function of the cost function.

And the θ can be updated by:
θ(t+1)=θ(t)H1θJ

the gradient and Hessian are defined as:
θJ=1mmi=1(hθ(x)y)x
H=1mmi=1[hθ(x)(1hθ(x))xTx]

The above equations were implemented using R:

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
### Newton's Method
## sigmoid function
g <- function(z) {
    1/(1+exp(-z))
}
 
## hypothesis function
h <- function(theta, x) {
    g(x %*% theta)
}
 
## cost function
J <- function(theta, x, y) {
    m <- length(y)
    s <- sapply(1:m, function(i)
                y[i]*log(h(theta,x[i,])) + (1-y[i])*log(1-h(theta,x[i,]))
                )
    j <- -1/m * sum(s)
    return(j)
}
 
 
## gradient
grad <- function(theta, x, y) {
    m <- length(y)
    g <- 1/m * t(x) %*% (h(theta,x)-y)
    return(g)
}
 
## Hessian
Hessian <- function(theta, x) {
    m <- nrow(x)
    H <- 1/m * t(x) %*% x * diag(h(theta,x)) * diag(1-h(theta,x))
    return(H)
}

The first question need to determine how many iteration until convergence.

?View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
12
theta <- rep(0, ncol(x))
j <- rep(0,10)
for (i in 1:10) {
    theta <- theta - solve(Hessian(theta,x)) %*% grad(theta,x,y)
    j[i] <- J(theta,x,y)
}
 
ggplot()+
    aes(x=1:10,y=j)+
    geom_point(colour="red")+
    geom_path()+xlab("Iteration")+
    ylab("Cost J")


As illustrated in the above figure, Newton's method converge very fast, only 4-5 iterations was needed.

The second question:What is the probability that a student with a score of 20 on Exam 1 and a score of 80 on Exam 2 will not be admitted?

> (1 - g(c(1, 20, 80) %*% theta))* 100
         [,1]
[1,] 64.24722

In our model, we predicted that the probability of the student will not admitted is 64%.

At last, we calculate our classification model based on θTx=0 , and visualize the fit as below:

?View Code RSPLUS
1
2
3
4
5
x1 <- c(min(x[,2]),max(x[,2]))
x2 <- -1/theta[3,] * (theta[2,]*x1+theta[1,])
a <- (x2[2]-x2[1])/(x1[2]-x1[1])
b <- x2[2]-a*x1[2]
p+geom_abline(slope=a, intercept=b)

Fantastic.

References:
Machine Learning Course
Exercise 4

Related Posts

To leave a comment for the author, please follow the link and comment on their blog: YGC » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)