Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’m excited to announce that my first package has been accepted to CRAN! The package pcLasso
implements principal components lasso, a new method for sparse regression which I’ve developed with Rob Tibshirani and Jerry Friedman. In this post, I will give a brief overview of the method and some starter code. (For an in-depth description and elaboration of the method, please see our arXiv preprint. For more details on how to use the package, please see the package’s vignette.)
Let’s say we are in the standard supervised learning setting, with design matrix
where
This optimization problem seems a little complicated so let me try to motivate it. Notice that if we replace
which we recognize as the optimization problem that elastic net solves. So we are doing something similar to elastic net.
To be more specific: we can think of
This method extends easily to groups (whether overlapping or non-overlapping). Assume that our features come in
Now for some basic code. Let’s make some fake data:
set.seed(1) n <- 100; p <- 10 X <- matrix(rnorm(n * p), nrow = n) y <- rnorm(n)
Just like glmnet
in the glmnet
package, the pcLasso
function fits the model for a sequence of
library(pcLasso) fit <- pcLasso(X, y, theta = 10)
We can use the generic predict
function to obtain predictions this fit makes on new data. For example, the following code extracts the predictions that pcLasso makes on the 5th
predict(fit, X[1:3, ])[, 5] # [1] 0.002523773 0.004959471 -0.014095065
The code above assumes that all our features belong to one big group. If our features come in groups, pcLasso can take advantage of that by specifying the groups
option. groups
should be a list of length groups[[k]]
being a vector of column indices which belong to group
> groups <- list(1:5, 6:10) > groups # [[1]] # [1] 1 2 3 4 5 # # [[2]] # [1] 6 7 8 9 10 fit <- pcLasso(X, y, theta = 10, groups = groups)
The function cv.pcLasso
fits pcLasso and picks optimal cv.pcLasso
function can also be used to predict on new data:
fit <- cv.pcLasso(X, y, theta = 10) predict(fit, X[1:3, ], s = "lambda.min") # [1] -0.01031697 -0.01031697 -0.01031697
The vignette contains significantly more detail on how to use this package. If you spot bugs, have questions, or have features that you would like to see implemented, get in touch with us!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.