Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exercise 4 required implementing Logistic Regression using Newton’s Method.
The dataset in use is 80 students and their grades of 2 exams, 40 students were admitted to college and the other 40 students were not. We need to implement a binary classification model to estimates college admission based on the student’s scores on these two exams.
plot the data
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | x <- read.table("ex4x.dat",header=F, stringsAsFactors=F) x <- cbind(rep(1, nrow(x)), x) colnames(x) <- c("X0", "Exam1", "Exam2") x <- as.matrix(x) y <- read.table("ex4y.dat",header=F, stringsAsFactors=F) y <- y[,1] ## plotting data d <- data.frame(x, y = factor(y, levels=c(0,1), labels=c("Not admitted","Admitted" ) ) ) require(ggplot2) p <- ggplot(d, aes(x=Exam1, y=Exam2)) + geom_point(aes(shape=y, colour=y)) + xlab("Exam 1 score") + ylab("Exam 2 score") |
Logistic Regression
We first need to define our Hypothesis Function that return values between[0,1],suitable for binary classification.
\(h_\theta(x) = g(\theta^T x) = \frac{1}{ 1 + e ^{- \theta^T x} }\)
function g is the sigmoid function, and function h return the probability of y=1:
\( h_\theta(x) = P (y=1 | x; \theta) \)
What we need is to compute \( \theta \) ,to find out the proper Hypothesis Function.
Similar to the linear regression,We defined the cost function, which estimate the error of hypothesis function fitting the sample data, at a given \( \theta\) .
The cost function was defined as:
\(J(\theta) = \frac{1}{m} \sum_{i=1}^m ((-y)log(h_\theta(x)) – (1 – y) log(1- h_\theta(x)) )\)
To determine the most suitable hypothesis function, we need to find the \( \theta\) value which minimize the value of \(J(\theta)\) . This can be achieved by the Newton’s method, by finding the root of the derivative function of the cost function.
And the \( \theta\) can be updated by:
\(\theta^{(t+1)} = \theta^{(t)} – H^{-1} \nabla_{\theta}J\)
the gradient and Hessian are defined as:
\(\nabla_{\theta}J = \frac{1}{m} \sum_{i=1}^m (h_\theta(x) – y) x\)
\(H = \frac{1}{m} \sum_{i=1}^m [h_\theta(x) (1 – h_\theta(x)) x^T x]\)
The above equations were implemented using R:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | ### Newton's Method ## sigmoid function g <- function(z) { 1/(1+exp(-z)) } ## hypothesis function h <- function(theta, x) { g(x %*% theta) } ## cost function J <- function(theta, x, y) { m <- length(y) s <- sapply(1:m, function(i) y[i]*log(h(theta,x[i,])) + (1-y[i])*log(1-h(theta,x[i,])) ) j <- -1/m * sum(s) return(j) } ## gradient grad <- function(theta, x, y) { m <- length(y) g <- 1/m * t(x) %*% (h(theta,x)-y) return(g) } ## Hessian Hessian <- function(theta, x) { m <- nrow(x) H <- 1/m * t(x) %*% x * diag(h(theta,x)) * diag(1-h(theta,x)) return(H) } |
The first question need to determine how many iteration until convergence.
1 2 3 4 5 6 7 8 9 10 11 12 | theta <- rep(0, ncol(x)) j <- rep(0,10) for (i in 1:10) { theta <- theta - solve(Hessian(theta,x)) %*% grad(theta,x,y) j[i] <- J(theta,x,y) } ggplot()+ aes(x=1:10,y=j)+ geom_point(colour="red")+ geom_path()+xlab("Iteration")+ ylab("Cost J") |
As illustrated in the above figure, Newton’s method converge very fast, only 4-5 iterations was needed.
The second question:What is the probability that a student with a score of 20 on Exam 1 and a score of 80 on Exam 2 will not be admitted?
> (1 - g(c(1, 20, 80) %*% theta))* 100 [,1] [1,] 64.24722
In our model, we predicted that the probability of the student will not admitted is 64%.
At last, we calculate our classification model based on \( \theta^T x = 0\) , and visualize the fit as below:
1 2 3 4 5 | x1 <- c(min(x[,2]),max(x[,2])) x2 <- -1/theta[3,] * (theta[2,]*x1+theta[1,]) a <- (x2[2]-x2[1])/(x1[2]-x1[1]) b <- x2[2]-a*x1[2] p+geom_abline(slope=a, intercept=b) |
Fantastic.
References:
Machine Learning Course
Exercise 4
Related Posts
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.