Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Welcome to the second part! In previous part, we understood Linear regression, cost function and gradient descent. In this part we will implement whole process in R step by step using example data set. I will use the data set provided in the machine learning class assignment. We will implement linear regression with one variable to predict profits for food truck.
Let us first discuss the linear regression problem (Information is given in ML class assignment). Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet. The chain already has trucks in various cities and you have data for profits and populations from the cities. You would like to use this data to help you which city to expand to next.
So, data set contains two columns. The first column is the population of city and the second column is the profit of a food truck in that city. A negative value for profit indicates loss. Download the data set from here.
Let us start with loading the data set in the R.
#Read data set data <- read.csv("data.csv")
Before starting any of tasks, it is often useful to understand the data by visualizing it.
We can see from the plot that city with higher population has high profit.
Here dependent variable is Profit and interdependent variable is Population. So let us set dependent variable Y and independent variable x.
#Dependent variable y <- data$profit #Independent variable x <- data$population
The objective of linear regression is to minimize cost function
Where hypothesis hΘ(x) is given by the linear model
To take into account the intercept term Θ0, we add an additional first column to x and set it to all ones. This allows us to treat Θ0 as simply another feature.
Let us first add ones to x, also initialize Θ0 and Θ1 to zero and calculate cost using above equation.
#Add ones to x x <- cbind(1,x) # initalize theta vector theta<- c(0,0) # Number of the observations m <- nrow(x) #Calculate cost cost <- sum(((x%*%theta)- y)^2)/(2*m)
For initial value of the theta parameter cost is 32.07, our objective is to minimize cost by updating the values of the thetas. One way to do this is to use batch gradient descent algorithm. We update the values of the thetas by iterating following equation.
With each step of gradient descent, parameters come closer to the optimal values that will achieve the lowest cost. To do this we will set learning parameter alpha to 0.01 and iterations to 1500
# Set learning parameter alpha <- 0.001 #Number of iterations iterations <- 1500 # updating thetas using gradient update for(i in 1:iterations) { theta[1] <- theta[1] - alpha * (1/m) * sum(((x%*%theta)- y)) theta[2] <- theta[2] - alpha * (1/m) * sum(((x%*%theta)- y)*x[,2]) }
After 1500 iterations, we will have lower cost than initial one and new thetas values. Now let us try to predict for the areas of the 35,000 and 70,000 people using new values of the theta.
#Predict for areas of the 35,000 and 70,000 people predict1 <- c(1,3.5) %*% theta predict2 <- c(1,7) %*% theta
So far we understood the implementation of the linear regression with one variable and prediction for the new data. In the next post I will discuss the other way of the cost minimization using optim() function and compare the result with the lm() function.
Powered by Google+ Comments
The post Linear Regression with R : step by step implementation part-2 appeared first on Pingax.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.