Site icon R-bloggers

Automatic Forecasting with `ahead::dynrmf` and Ridge regression

[This article was first published on T. Moudiki's Webpage - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last week I presented ahead, an R package for univariate and multivariate time series forecasting. In particular, the function dynrmf was introduced for univariate time series, with examples of Random Forest and Support Vector Machines fitting functions (fitting and predicting through fit_func and predict_func arguments of dynrmf). First things first, here’s how to install R package ahead:

  • 2nd method: from Github

    In R console:

      devtools::install_github("Techtonique/ahead")
    

    Or

      remotes::install_github("Techtonique/ahead")
    
  • In version 0.2.0 of ahead, Ridge regression is the default fitting function for dynrmf. Let’s see how it works:

    library(datasets)
    library(ahead)
    
    # We start by a demo of `ahead`'s Ridge regression implementation on random tabular data
    set.seed(123)
    n <- 100 ; p <- 10
    X <- matrix(rnorm(n * p), n, p) 
    y <- rnorm(n)
    
    # default behavior for ahead::ridge: a sequence of 100 regularization parameters lambdas is provided 
    fit_obj <- ahead::ridge(X, y)
    
    # plot
    par(mfrow=c(3, 2))
    # regression coefficients (10) as a function of log(lambda)
    matplot(log(fit_obj$lambda), t(fit_obj$coef), type = 'l',  main="coefficients \n f(lambda)")
    # Generalized Cross Validation (GCV) error as a function of log(lambda)
    plot(log(fit_obj$lambda), fit_obj$GCV, type='l', main="GCV error")
    # dynrmf with different values of the regularization parameter lambda
    # ahead::ridge is provided as default `fit_func`, you can print(head(ahead::dynrmf))
    plot(ahead::dynrmf(USAccDeaths, h=20, level=95, fit_params=list(lambda = 0.1)), main="lambda = 0.1")
    plot(ahead::dynrmf(USAccDeaths, h=20, level=95, fit_params=list(lambda = 10)), main="lambda = 10")
    plot(ahead::dynrmf(USAccDeaths, h=20, level=95, fit_params=list(lambda = 100)), main="lambda = 100")
    plot(ahead::dynrmf(USAccDeaths, h=20, level=95, fit_params=list(lambda = 1000)), main="lambda = 1000")
    

    As demonstrated in the previous code snippet, you can try different values of the regularization parameter lambda, and see how ahead’s performance is influenced by each one of your choices. However, if you do not choose a regularization parameter \(\lambda\), the one that minimizes Generalized Cross Validation (GCV) error is automatically (automatically, yes, but not pretending that this will always guarantee the best out-of-sample accuracy) picked internally, on a grid of 100 values. In the examples below of dynrmf, the \(\lambda\) that minimizes Generalized Cross Validation (GCV) error is picked internally :

    par(mfrow=c(3, 2))
    # nothing else required, default is Ridge regression with minimal GCV lambda
    plot(ahead::dynrmf(USAccDeaths, h=20, level=95))
    plot(ahead::dynrmf(AirPassengers, h=20, level=95))
    plot(ahead::dynrmf(lynx, h=20, level=95))
    plot(ahead::dynrmf(diff(WWWusage), h=20, level=95))
    plot(ahead::dynrmf(Nile, h=20, level=95))
    plot(ahead::dynrmf(fdeaths, h=20, level=95))
    

    To leave a comment for the author, please follow the link and comment on their blog: T. Moudiki's Webpage - R.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.