Lasso Regression (home made)

[This article was first published on R-english – Freakonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

To compute Lasso regression, 12yXβ22+λβ1define the soft-thresholding function\(S(z,\gamma)=\text{sign}(z)\cdot(|z|-\gamma)_+={zγ if γ>|z| and z<0z+γ if γ<|z| and z<00 if γ|z|

[/latex]The R function would be

soft_thresholding = function(x,a){
sign(x) * pmax(abs(x)-a,0)
}

To solve our optimization problem, set[latex display="true"]\mathbf{r}_j=\mathbf{y} – \left(\beta_0\mathbf{1}+\sum_{k\neq j}\beta_k\mathbf{x}_k\right)=\mathbf{y}-\widehat{\mathbf{y}}^{(j)}\)
so that the optimization problem can be written, equivalently
min{12npj=1[rjβjxj]2+λ|βj|}
hencemin{12npj=1β2jxj2βjrTjxj+λ|βj|}
and one gets
βj,λ=1xj2S(rTjxj,nλ)
or, if we develop
βj,λ=1ix2ijS(ixi,j[yiˆy(j)i],nλ)
Again, if there are weights ω=(ωi), the coordinate-wise update becomes
βj,λ,ω=1iωix2ijS(iωixi,j[yiˆy(j)i],nλ)
The code to compute this componentwise descent is

lasso_coord_desc = function(X,y,beta,lambda,tol=1e-6,maxiter=1000){
  beta = as.matrix(beta)
  X = as.matrix(X)
  omega = rep(1/length(y),length(y))
  obj = numeric(length=(maxiter+1))
  betalist = list(length(maxiter+1))
  betalist[[1]] = beta
  beta0list = numeric(length(maxiter+1))
  beta0 = sum(y-X%*%beta)/(length(y))
  beta0list[1] = beta0
  for (j in 1:maxiter){
    for (k in 1:length(beta)){
      r = y - X[,-k]%*%beta[-k] - beta0*rep(1,length(y))
      beta[k] = (1/sum(omega*X[,k]^2))*
        soft_thresholding(t(omega*r)%*%X[,k],length(y)*lambda)
    }
    beta0 = sum(y-X%*%beta)/(length(y))
    beta0list[j+1] = beta0
    betalist[[j+1]] = beta
    obj[j] = (1/2)*(1/length(y))*norm(omega*(y - X%*%beta - 
           beta0*rep(1,length(y))),'F')^2 + lambda*sum(abs(beta))
    if (norm(rbind(beta0list[j],betalist[[j]]) - 
             rbind(beta0,beta),'F') < tol) { break } 
  } 
  return(list(obj=obj[1:j],beta=beta,intercept=beta0)) }

For instance, consider the following (simple) dataset, with three covariates

chicago = read.table("http://freakonometrics.free.fr/chicago.txt",header=TRUE,sep=";")

that we can “normalize”

X = model.matrix(lm(Fire~.,data=chicago))[,2:4]
for(j in 1:3) X[,j] = (X[,j]-mean(X[,j]))/sd(X[,j])
y = chicago$Fire
y = (y-mean(y))/sd(y)

To initialize the algorithm, use the OLS estimate

beta_init = lm(Fire~0+.,data=chicago)$coef

For instance

lasso_coord_desc(X,y,beta_init,lambda=.001)
$obj
[1] 0.001014426 0.001008009 0.001009558 0.001011094 0.001011119 0.001011119

$beta
          [,1]
X_1  0.0000000
X_2  0.3836087
X_3 -0.5026137

$intercept
[1] 2.060999e-16

and we can get the standard Lasso plot by looping,

To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)