A deep dive into glmnet: predict.glmnet
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’m writing a series of posts on various function options of the glmnet
function (from the package of the same name), hoping to give more detail and insight beyond R’s documentation.
In this post, instead of looking at one of the function options of glmnet
, we’ll look at the predict
method for a glmnet
object instead. The object returned by glmnet
(call it fit
) has class "glmnet"
; when we run predict(fit)
, it runs the predict
method for class "glmnet"
objects, i.e. predict.glmnet(fit)
.
For reference, here is the full signature of the predict.glmnet
function/method (v3.0-2):
predict(object, newx, s = NULL, type = c("link", "response", "coefficients", "nonzero", "class"), exact = FALSE, newoffset, ...)
In the above, object
is a fitted "glmnet"
object (call it fit
). Recall that every glmnet fit
has a lambda sequence associated with it: this will be important in understanding what follows. (This sequence can be accessed via fit$lambda
.)
For the rest of this post, we will use the following data example:
set.seed(1) n <- 100; p <- 20 x <- matrix(rnorm(n * p), nrow = n) beta <- matrix(c(rep(1, 5), rep(0, 15)), ncol = 1) y <- x %*% beta + rnorm(n) fit <- glmnet(x, y)
Function option: newx
newx
is simply the new x
matrix at which we want predictions for. So for example, if we want predictions for the training x
matrix, we would do
predict(fit, x)
If no other arguments are passed, we will get a matrix of predictions, each column corresponding to predictions for each value of in fit$lambda
. For our example, fit$lambda
has length 68 and x
consists of 100 rows/observations, so predict(fit, x)
returns a matrix.
length(fit$lambda) # [1] 68 dim(predict(fit, x)) # [1] 100 68
newx
must be provided except when type="coefficients"
or type="nonzero"
(more on these types later).
Function option: newoffset
If the original glmnet
call was fit with an offset, then an offset must be included in the predict
call under the newoffset
option. If not included, an error will be thrown.
set.seed(2) offset <- rnorm(n) fit2 <- glmnet(x, y, offset = offset) predict(fit2, x) # Error: No newoffset provided for prediction, yet offset used in fit of glmnet
The reverse is true, in that if the original glmnet
call was NOT fit with an offset, then predict
will not allow you to include an offset in the prediction, EVEN if you pass it the newoffset
option. It does not throw a warning or error, but simply ignore the newoffset
option. You have been warned! This is demonstrated in the code snippet below.
pred_no_offset <- predict(fit, x) pred_w_offset <- predict(fit, x, offset = offset) max(abs(pred_no_offset - pred_w_offset)) # [1] 0
Function option: s
and exact
s
indicates the values for which we want predictions at. If the user does not specify s
, predict
will give predictions for each of the values in fit$lambda
.
(Why is this option named s
and not the more intuitive lambda
? In page 5 of this vignette, the authors say they made this choice “in case later we want to allow one to specify the model size in other ways”. lambda
controls the model size in the sense that the larger it is, the more coefficients will be forced to zero. There are other ways to specify model size. For example, one could imagine a function option where we specify the number of non-zero coefficients we want in the model, or where we specify the maximum norm the coefficient vector can have. None of these other options have been implemented at the moment.)
If the user-specified s
values all belong to fit$lambda
, then predict
pulls out the coefficients corresponding to those values and returns predictions. In this case, the exact
option has no effect.
If the user-specified s
value does NOT belong to fit$lambda
, things get interesting. If exact=FALSE
(the default), predict
uses linear interpolation to make predictions. (More accurately, it does linear interpolation of the coefficients, which translates to linear interpolation of the predictions.) As stated in the documentation: “while this is often a good approximation, it can sometimes be a bit coarse”.
As a demonstration: In the snippet below, we look at the predictions at a value of that lies between the two largest values in fit$lambda
. If the function does as the documentation says, the last line should give a value of 0 (to machine precision).
b1 <- as.numeric(predict(fit, x, s = fit$lambda[1])) b2 <- as.numeric(predict(fit, x, s = fit$lambda[2])) b3 <- as.numeric(predict(fit, x, s = 0.3*fit$lambda[1] + 0.7*fit$lambda[2])) max(abs(b3 - (0.3*b1 + 0.7*b2))) # [1] 3.885781e-16
What happens if we have values in s
that are not within the range of fit$lambda
? First, I would recommend using exact=TRUE
because extrapolation beyond the range of fit$lambda
is dangerous in general. In my little experiments, it looks like predict
simply returns the predictions for the value in fit$lambda
that is closest to s
.
If exact=TRUE
, predict
merges s
with fit$lambda
to get a single (decreasing) sequence, refits the glmnet model, then returns predictions at the values in s
. If your training data is very large, this refitting could take a long time.
One note when using exact=TRUE
is that you have to pass in additional arguments in order for the refitting to happen. That’s because the fitted glmnet
object does not contain all the ingredients needed to do refitting. For our example, to predict for fit
we need to supply x
and y
as well. For more complicated glmnet calls, more options have to be provided.
predict(fit, x, s = fit$lambda[68] / 2, exact = TRUE) # Error: used coef.glmnet() or predict.glmnet() with `exact=TRUE` # so must in addition supply original argument(s) x and y in order to # safely rerun glmnet predict(fit, x, s = fit$lambda[68] / 2, exact = TRUE, x = x, y = y) # glmnet correctly returns predictions...
Function option: type
The type
option determines the type of prediction returned. type="coefficients"
returns the model coefficients for the values in s
as a sparse matrix. type="nonzero"
returns a list, with each element being a vector of the features which have non-zero features. For example, the code snippet below shows that for the second and third values in fit$lambda
, the features that have non-zero coefficients are feature 5 and features 3 and 5 respectively.
predict(fit, type = "nonzero", s = fit$lambda[2:3]) # $`1` # [1] 5 # # $`2` # [1] 3 5
For type="coefficients"
and type="nonzero"
, the user does not have to provide a newx
argument since the return value does not depend on where we want the predictions. For the rest of the possible values of type
, newx
is required.
For type="link"
(the default) and type="response"
it helps to know a little GLM theory. For a observation having values , type="link"
returns , where is the coefficient vector corresponding to a value in s
.
For type="response"
, is passed through the GLM’s inverse link function to return predictions on the y
scale. For “gaussian” family it is still . For “binomial” and “poisson” families it is and respectively. For “multinomial” it returns fitted probabilities and for “cox” it returns fitted relative risk.
The final possibility, type="class"
, applies only to “binomial” and “multinomial” families. For each observation, it simply returns the class with the highest predicted probability.
Bonus: The coef
method
The coef
method for glmnet is actually just a special case of the predict
method. This can be seen from the source code:
coef.glmnet # function (object, s = NULL, exact = FALSE, ...) # predict(object, s = s, type = "coefficients", exact = exact, # ...) # <bytecode: 0x7ff3ae934f20> # <environment: namespace:glmnet>
Bonus: predict.elnet
, predict.lognet
, …
If you inspect the class of the object returned by a glmnet
call, you will realize that it has more than one class. In the code below, we see that “gaussian” family results in an “elnet” class object. (“binomial” family returns a “lognet” object, “poisson” family returns a “fishnet” object, etc.)
class(fit) # [1] "elnet" "glmnet"
These classes have their own predict
methods as well, but they draw on this base predict.glmnet
call. As an example, here is the code for predict.fishnet
:
glmnet:::predict.fishnet # function (object, newx, s = NULL, type = c("link", "response", # "coefficients", "nonzero"), exact = FALSE, newoffset, ...) # { # type = match.arg(type) # nfit = NextMethod("predict") # switch(type, response = exp(nfit), nfit) # } # <bytecode: 0x7ff3ab622040> # <environment: namespace:glmnet>
What happens here is that predict.glmnet
is first called. If type
is not "response"
, then we simply return whatever predict.glmnet
would have returned. However, if type="response"
, then (i) we call predict.glmnet
, and (ii) the predictions are passed through the function before being returned.
This is how predict
is able to give the correct return output across the different family
and type
options.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.