A deep dive into glmnet: predict.glmnet

[This article was first published on R – Statistical Odds & Ends, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m writing a series of posts on various function options of the glmnet function (from the package of the same name), hoping to give more detail and insight beyond R’s documentation.

In this post, instead of looking at one of the function options of glmnet, we’ll look at the predict method for a glmnet object instead. The object returned by glmnet (call it fit) has class "glmnet"; when we run predict(fit), it runs the predict method for class "glmnet" objects, i.e. predict.glmnet(fit).

For reference, here is the full signature of the predict.glmnet function/method (v3.0-2):

predict(object, newx, s = NULL, type = c("link",
  "response", "coefficients", "nonzero", "class"), exact = FALSE,
  newoffset, ...)

In the above, object is a fitted "glmnet" object (call it fit). Recall that every glmnet fit has a lambda sequence associated with it: this will be important in understanding what follows. (This sequence can be accessed via fit$lambda.)

For the rest of this post, we will use the following data example:

set.seed(1)
n <- 100; p <- 20
x <- matrix(rnorm(n * p), nrow = n)
beta <- matrix(c(rep(1, 5), rep(0, 15)), ncol = 1)
y <- x %*% beta + rnorm(n)

fit <- glmnet(x, y)

Function option: newx

newx is simply the new x matrix at which we want predictions for. So for example, if we want predictions for the training x matrix, we would do

predict(fit, x)

If no other arguments are passed, we will get a matrix of predictions, each column corresponding to predictions for each value of \lambda in fit$lambda. For our example, fit$lambda has length 68 and x consists of 100 rows/observations, so predict(fit, x) returns a 100 \times 68 matrix.

length(fit$lambda)
# [1] 68
dim(predict(fit, x))
# [1] 100  68

newx must be provided except when type="coefficients" or type="nonzero" (more on these types later).

Function option: newoffset

If the original glmnet call was fit with an offset, then an offset must be included in the predict call under the newoffset option. If not included, an error will be thrown.

set.seed(2)
offset <- rnorm(n)
fit2 <- glmnet(x, y, offset = offset)
predict(fit2, x)
# Error: No newoffset provided for prediction, yet offset used in fit of glmnet

The reverse is true, in that if the original glmnet call was NOT fit with an offset, then predict will not allow you to include an offset in the prediction, EVEN if you pass it the newoffset option. It does not throw a warning or error, but simply ignore the newoffset option. You have been warned! This is demonstrated in the code snippet below.

pred_no_offset <- predict(fit, x)
pred_w_offset <- predict(fit, x, offset = offset)
max(abs(pred_no_offset - pred_w_offset))
# [1] 0

Function option: s and exact

s indicates the \lambda values for which we want predictions at. If the user does not specify s, predict will give predictions for each of the \lambda values in fit$lambda.

(Why is this option named s and not the more intuitive lambda? In page 5 of this vignette, the authors say they made this choice “in case later we want to allow one to specify the model size in other ways”. lambda controls the model size in the sense that the larger it is, the more coefficients will be forced to zero. There are other ways to specify model size. For example, one could imagine a function option where we specify the number of non-zero coefficients we want in the model, or where we specify the maximum \ell_1 norm the coefficient vector can have. None of these other options have been implemented at the moment.)

If the user-specified s values all belong to fit$lambda, then predict pulls out the coefficients corresponding to those values and returns predictions. In this case, the exact option has no effect.

If the user-specified s value does NOT belong to fit$lambda, things get interesting. If exact=FALSE (the default), predict uses linear interpolation to make predictions. (More accurately, it does linear interpolation of the coefficients, which translates to linear interpolation of the predictions.) As stated in the documentation: “while this is often a good approximation, it can sometimes be a bit coarse”.

As a demonstration: In the snippet below, we look at the predictions at a value of \lambda that lies between the two largest values in fit$lambda. If the function does as the documentation says, the last line should give a value of 0 (to machine precision).

b1 <- as.numeric(predict(fit, x, s = fit$lambda[1]))
b2 <- as.numeric(predict(fit, x, s = fit$lambda[2]))
b3 <- as.numeric(predict(fit, x,
    s = 0.3*fit$lambda[1] + 0.7*fit$lambda[2]))
max(abs(b3 - (0.3*b1 + 0.7*b2)))
# [1] 3.885781e-16

What happens if we have values in s that are not within the range of fit$lambda? First, I would recommend using exact=TRUE because extrapolation beyond the range of fit$lambda is dangerous in general. In my little experiments, it looks like predict simply returns the predictions for the \lambda value in fit$lambda that is closest to s.

If exact=TRUE, predict merges s with fit$lambda to get a single (decreasing) \lambda sequence, refits the glmnet model, then returns predictions at the \lambda values in s. If your training data is very large, this refitting could take a long time.

One note when using exact=TRUE is that you have to pass in additional arguments in order for the refitting to happen. That’s because the fitted glmnet object does not contain all the ingredients needed to do refitting. For our example, to predict for fit we need to supply x and y as well. For more complicated glmnet calls, more options have to be provided.

predict(fit, x, s = fit$lambda[68] / 2, exact = TRUE)
# Error: used coef.glmnet() or predict.glmnet() with `exact=TRUE` 
# so must in addition supply original argument(s)  x and y  in order to 
# safely rerun glmnet
predict(fit, x, s = fit$lambda[68] / 2, exact = TRUE, x = x, y = y)
# glmnet correctly returns predictions...

Function option: type

The type option determines the type of prediction returned. type="coefficients" returns the model coefficients for the \lambda values in s as a sparse matrix. type="nonzero" returns a list, with each element being a vector of the features which have non-zero features. For example, the code snippet below shows that for the second and third \lambda values in fit$lambda, the features that have non-zero coefficients are feature 5 and features 3 and 5 respectively.

predict(fit, type = "nonzero", s = fit$lambda[2:3])
# $`1`
# [1] 5
# 
# $`2`
# [1] 3 5

For type="coefficients" and type="nonzero", the user does not have to provide a newx argument since the return value does not depend on where we want the predictions. For the rest of the possible values of type, newx is required.

For type="link" (the default) and type="response" it helps to know a little GLM theory. For a observation having values x \in \mathbb{R}^p, type="link" returns x^T \beta, where \beta is the coefficient vector corresponding to a \lambda value in s.

For type="response", x^T \beta is passed through the GLM’s inverse link function to return predictions on the y scale. For “gaussian” family it is still x^T \beta. For “binomial” and “poisson” families it is \exp(x^T \beta) / (1 + \exp(x^T \beta)) and \exp(x^T \beta) respectively. For “multinomial” it returns fitted probabilities and for “cox” it returns fitted relative risk.

The final possibility, type="class", applies only to “binomial” and “multinomial” families. For each observation, it simply returns the class with the highest predicted probability.

Bonus: The coef method

The coef method for glmnet is actually just a special case of the predict method. This can be seen from the source code:

coef.glmnet
# function (object, s = NULL, exact = FALSE, ...) 
#     predict(object, s = s, type = "coefficients", exact = exact, 
#             ...)
# <bytecode: 0x7ff3ae934f20>
# <environment: namespace:glmnet>

Bonus: predict.elnet, predict.lognet, …

If you inspect the class of the object returned by a glmnet call, you will realize that it has more than one class. In the code below, we see that “gaussian” family results in an “elnet” class object. (“binomial” family returns a “lognet” object, “poisson” family returns a “fishnet” object, etc.)

class(fit)
# [1] "elnet"  "glmnet"

These classes have their own predict methods as well, but they draw on this base predict.glmnet call. As an example, here is the code for predict.fishnet:

glmnet:::predict.fishnet
# function (object, newx, s = NULL, type = c("link", "response", 
#        "coefficients", "nonzero"), exact = FALSE, newoffset, ...) 
# {
#     type = match.arg(type)
#     nfit = NextMethod("predict")
#     switch(type, response = exp(nfit), nfit)
# }
# <bytecode: 0x7ff3ab622040>
# <environment: namespace:glmnet>

What happens here is that predict.glmnet is first called. If type is not "response", then we simply return whatever predict.glmnet would have returned. However, if type="response", then (i) we call predict.glmnet, and (ii) the predictions are passed through the function x \mapsto \exp(x) before being returned.

This is how predict is able to give the correct return output across the different family and type options.

To leave a comment for the author, please follow the link and comment on their blog: R – Statistical Odds & Ends.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)