A First Look at TimeGPT using nixtlar

[This article was first published on R Works, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post is a first look at Nixtla’s TimeGPT generative, pre-trained transformer for time series forecasting using the nixtlar R package.

As described in Garza et al. (2021), TimeGPT is a Transformer-based time series model with self-attention mechanisms. The architecture comprises an encoder-decoder structure with multiple layers, each with residual connections and layer normalization. The encoder, a stack of multi-head self-attention layers followed by a feed-forward neural network, processes the input time series. The decoder, which is similar to the encoder, generates the forecast. The decoder includes an additional multi-head attention layer that takes the encoder’s output as input. The model is trained using a teacher-forcing strategy, where the decoder receives the ground-truth values during training. The model is then used for forecasting by feeding the model’s predictions back as input during inference.

TimeGPT architecture

Nixtla’s website provides a considerable amount of explanatory material, documentation, and code examples in Python. The nixtlar package wraps the Python code to provide an R interface. The package documentation for version 0.6.2 doesn’t fully the R functions, but the vignettes provide sufficient code examples to get started.

Before getting started with TimeGPT, you will have to register for an API key. The process is easy enough and is described in this vignette.

Show the code
library(tidyverse)
library(forecast)
library(prophet)
library(nixtlar)

The Data

The electricity dataset included in the nixtlar package contains hourly observations of electricity consumption generated sourced from the PJM Interconnection LLC, a regional transmission organization that is part of the Eastern Interconnection grid in the United States. There are five different time series with data taken from 2012 to 2018.

Note that the electricity data is in “long” format and that ds the time variable is character data.

Show the code
df <- nixtlar::electricity
glimpse(df)
Rows: 8,400
Columns: 3
$ unique_id <chr> "BE", "BE", "BE", "BE", "BE", "BE", "BE", "BE", "BE", "BE", …
$ ds        <chr> "2016-10-22 00:00:00", "2016-10-22 01:00:00", "2016-10-22 02…
$ y         <dbl> 70.00, 37.10, 37.10, 44.75, 37.10, 35.61, 34.55, 50.49, 61.5…

A look at the “wide” data frame shows that the various series do not cover the same time periods.

Show the code
df_wide <- df |>
  pivot_wider(names_from = unique_id, values_from = y)

head(df_wide)
# A tibble: 6 × 6
  ds                     BE    DE    FR    NP   PJM
  <chr>               <dbl> <dbl> <dbl> <dbl> <dbl>
1 2016-10-22 00:00:00  70      NA  54.7    NA    NA
2 2016-10-22 01:00:00  37.1    NA  51.2    NA    NA
3 2016-10-22 02:00:00  37.1    NA  48.9    NA    NA
4 2016-10-22 03:00:00  44.8    NA  45.9    NA    NA
5 2016-10-22 04:00:00  37.1    NA  41.2    NA    NA
6 2016-10-22 05:00:00  35.6    NA  41.4    NA    NA

Plots indicate that all of the series show periods of considerable volatility. The BE, DE, and FR series appear to be stationary. NP trends upward to the right, and the PJM series appears to be nonlinear.

Show the code
df2 <- df |> mutate(time = as.POSIXct(ds, format = "%Y-%m-%d %H:%M:%S")) |>
  group_by(unique_id)

p <- df2 |> ggplot(aes(x = time, y = y, color = unique_id)) +
  geom_line() + facet_wrap( ~ unique_id, scales = "free")

p

TimeGPT Forecsts

I’ll begin by showing off the nixtlar forecasting function, which can handle multiple time series by forecasting eight hours ahead using all of the data. The parameter h specifies the number of steps ahead to forecast, and level specifies the confidence level for the forecast.

Here is the built-in nixtlar plot function.

Show the code
#nixtla_client_plot(df, nixtla_client_fcst, max_insample_length = 200)

This plot uses ggplot2to focus in on the forecasts of 8 points using all of the data. The level parameter indicates that both 80% and 96% confidence intervals should be computed.

Show the code
# nixtla_client_fcst <- nixtla_client_forecast(df, h = 8, level = c(80,95))
# saveRDS(nixtla_client_fcst, "nixtla_client_fcst.rds")

nixtla_client_fcst <- readRDS("nixtla_client_fcst.rds")

ncf_df <-  nixtla_client_fcst |> mutate(time = as.POSIXct(ds, format = "%Y-%m-%d %H:%M:%S")) |> group_by(unique_id)

names(ncf_df) <- c("unique_id", "ds", "TimeGPT", "lon", "loe", "hie", "hin")

pf <- ncf_df |> ggplot(aes(x = ds, y = TimeGPT, color = unique_id)) +
  geom_line() +
  geom_ribbon(aes(ymin = lon, ymax = hin),
              linetype = 2,
              alpha = 0.1) +
  facet_wrap( ~ unique_id, scales = "free")

pf

For the rest of this post, I’ll work only with the BE data and do some simple back testing. I will split the data into a training set and a test set containing 24 hours worth of observations. Then, I’ll fit established time series forecasting models and compare how well they do vis-a-vis the actual data and with each other. Note, I will not attempt any tuning of these models in an attempt to make it a fair, “out-of-the-box” comparison.

Show the code
NF <- 24

BE_df_wide <- df |> pivot_wider(names_from = unique_id, values_from = y) |>
  select(ds, BE) |> drop_na()

BE_train_df <- BE_df_wide %>% filter(row_number() <= n() - NF)
BE_test_df <- tail(BE_df_wide, NF)
BE_train_df <- BE_train_df |> rename(y = BE) |> mutate(unique_id = "BE")
BE_test_df <- BE_test_df |> rename(y = BE)

The nixtla_client_forecast() function is the main nixtlar forecasting function. (I have already run this function and saved the results RDS file in order not to make an API call every time the code is run during the blog building process.)

Show the code
#nixtla_fcst <- nixtla_client_forecast(BE_train_df, h = NF, level = 95)
#saveRDS(nixtla_fcst, "nixtla_fcst_24.rds")
nixtla_fcst <- readRDS("nixtla_fcst_24.rds")
names(nixtla_fcst) <- c("unique_id", "ds", "TimeGPT", "lo95", "up95")

Here, I create a data frame to hold the actual and forecast values.

Show the code
fcst_df <- tail(nixtla_fcst, NF) |> select(ds, TimeGPT) |>
  rename(time = ds, tgpt_fcst = TimeGPT) |>
  mutate(elec_actual = BE_test_df$y)

head(fcst_df)
                 time tgpt_fcst elec_actual
1 2016-12-30 00:00:00  38.82010       44.30
2 2016-12-30 01:00:00  36.29234       44.30
3 2016-12-30 02:00:00  34.97838       41.26
4 2016-12-30 03:00:00  32.99565       40.62
5 2016-12-30 04:00:00  31.58322       40.07
6 2016-12-30 05:00:00  33.27422       41.02

Some Comparative Forecasts

This next section of code puts the training data into a time series format that is suitable for the forecast::auto.arima() and forecast::auto.ets() functions. Both of these functions require that the data be expressed as a ts() object. The original version of this post formatted the data as an xts() object: an error that substantially impacted the ARIMA and exponential smoothing forecasts. There is a problem though with using ts(): the electricity data has multiseasonal aspects which ts() was not designed to accommodate. The code below that creates the ts() object treats time as an index of days with a period of 24 and ignores all the rest of the seasonality information in the data set. This is clearly not a perfect solution, but it provides a workaround that is good enough for my purpose of comparing TimeGPT with a couple of very simple automatic forecasts. You will see below that ARIMA forecast is quite good.

Show the code
auto_train <- BE_train_df |> select(-unique_id) |>
mutate(time = 1:length(ds))|> select(-ds)
elec_ts <- ts(auto_train$y, frequency = 24)

ARIMA Forecast with auto.arima()

The auto.arima() function from the forecast package fits a fairly sophisticated ARIMA(2,1,1)(1.0.1)[24] model. The parameters in parentheses means two autoregressive terms, one difference, one moving average term, one seasonal autoregressive term, no seasonal differencing, and one seasonal moving average term. [24] indicates that the seasonal pattern repeats every 24 observations. (Note that because the ARIMA forecast takes several seconds to run, I am reading form a an rds file to expedite the blog process.)

Show the code
#Run the following three lines to actually generate the forecast on your own.
# arima_fcst <- elec_ts |>
# auto.arima() |>
# forecast(h = NF , level = 95)
#saveRDS(arima_fcst, "arima_fcst_24.rds")
arima_fcst <- readRDS("arima_fcst_24.rds")

Exponential Smoothing Forecast with ets()

Because I have provided no guidance, the ets() function from the forecast package fits an ETS(M,N,M) model. This is a multiplicative model without a trend component, where both the error and the seasonal components are multiplicative. The first M indicates that the error term (random fluctuation) is modeled as a multiplicative component where the error term’s effect on the forecasted value is proportional to the level of the time series.

Show the code
ets_fcst <- elec_ts |>
  ets() |>
  # number of periods to forecast
  forecast(h = NF)

Prophet Forecast

I also ask the prophet() function from the prophet package for an automatic fit using the default parameters. Among other things, this means a linear growth curve with additive seasonality and automatic estimates for daily seasonality. As for the TimeGPT forecast, the model is fit usingBE_train_df data frame in which the time variable is character data. The make_future_dataframe() function creates a data frame with the same structure as BE_train_df but with the ds column extended by NF periods.

Show the code
prophet_fit <- prophet(BE_train_df)

future <- make_future_dataframe(
  prophet_fit,
  periods = NF,
  freq = 3600,
  include_history = FALSE
)

prophet_fcst <- predict(prophet_fit, future)

Results and Discussion

Before plotting, let’s have a look at the wide data frame that holds the forecasts.

Show the code
fcst_df2 <- fcst_df |>
  mutate(
    arima_fcst = as.vector(arima_fcst$mean),
    ets_fcst = as.vector(ets_fcst$mean),
    prophet_fcst = prophet_fcst$yhat
  )

head(fcst_df2)
                 time tgpt_fcst elec_actual arima_fcst ets_fcst prophet_fcst
1 2016-12-30 00:00:00  38.82010       44.30   46.19057 37.72396     32.05748
2 2016-12-30 01:00:00  36.29234       44.30   44.99377 34.97583     29.74375
3 2016-12-30 02:00:00  34.97838       41.26   44.06233 32.89185     22.66588
4 2016-12-30 03:00:00  32.99565       40.62   42.45347 29.87591     15.85864
5 2016-12-30 04:00:00  31.58322       40.07   43.05403 27.56602     15.62841
6 2016-12-30 05:00:00  33.27422       41.02   44.28001 29.15387     23.54824

Then, shape the data into long format and plot.

Show the code
fcst_dft2_long <- fcst_df2 %>%
  pivot_longer(!time, names_to = "method", values_to = "mean")

q <- fcst_dft2_long |>
  ggplot(aes(
    x = time,
    y = mean,
    group = method,
    color = method
  )) +
  geom_line() +
  geom_point() +
  ggtitle("TimeGPT vs ARIMA vs ETS vs Prophet vs actual data - 24 Point Forecast")

q

The TimeGPT forecast looks quite good. I don’t think this is a big surprise, given that the Nixtla folks chose the electricity data set to show off their transformer. However, it is curious, that except for one point, the TimeGPT forecast is lower than the actual data. It is also interesting that some of the forecasted points that are farther out are a better match to the actual data than the initial forecast points.

The ARIMA forecast is very good. It tracks the first few points very closely, undershoots the peaks of the actual data, but recovers after both peaks and tracks the data well towards the end of the forecast period.

The exponential smoothing forecast mostly stays well below the actual data, but also dose pretty well overall.

The black box prophet model overacts to the downward trends at the beginning and end of the forecast period. My guess is that with a little tuning prophet could do much better.

It is also worth noting that choosing the best forecast also depends on your objectives. For example, in some circumstances, one might prefer a forecast that reproduces seasonal patterns or possible volatility rather than overall accuracy. For this exercise, we see that TimeGPT mimics the volatility of the actual data but that the ARIMA forecast has the lowest root mean squared error. Also note that the ARIMA and exponential smoothing forecasts are not quite black-box forecasts. In converting the data into ts() a time series object explicitly provided seasonality information. TimeGPT apparently inferred this information from character data.

Show the code
RMSE <-  function(m, o){sqrt(mean((m - o)^2))}
rms_names <- c("tgpt", "arima", "ets", "prophet")
rms_fcst <- array(NA_real_,
                          dim = 4,
                          dimnames = list(rms_names))
rms_fcst[1] <- RMSE(fcst_df2$tgpt_fcst, fcst_df2$elec_actual)
rms_fcst[2] <- RMSE(fcst_df2$arima_fcst, fcst_df2$elec_actual)
rms_fcst[3] <- RMSE(fcst_df2$ets_fcst, fcst_df2$elec_actual)
rms_fcst[4] <- RMSE(fcst_df2$prophet_fcst, fcst_df2$elec_actual)
rms_fcst
     tgpt     arima       ets   prophet 
 6.601752  4.966623  9.355767 14.948709 

Some Final Thoughts

It is clear that the TimeGPT model has upped the game for black-box time series forecasting. It is sure to become a powerful tool for exploratory work with large time series and for comparing multiple time series. It may become the go-to baseline forecasting tool for a wide range of time series projects. Moreover, I expect that time series experts who can fine-tune prophet and more traditional time series models will be able to develop some intuition about what TimeGPT is doing by assessing its behavior in relation to these models.

I am aware that this little post may have raised more questions than it answered. If so, please try your hand at elaborating on some of the issues raised. We would be very happy to consider your time series posts for publication on R Works.

Finally, for a more sophisticated analysis of these series that deals with their multiseasonality aspects, see the Electricity Load Forecast Tutorial. And, for some ideas about how to harness “ordinary” LLMs for time series forecasting have a look at the second half of the talk that Bryan Lewis gave to nyhackr in April 2024.

Acknowledgment

Many thanks to Professor Rob Hyndman for flagging the ARIMA time series object error described above and for generously suggesting alternatives that led to the workaround described above. Any errors still remaining in this post are all mine.

To leave a comment for the author, please follow the link and comment on their blog: R Works.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)