Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Are you tired of creating lag variables one by one? Are you ready to level up your time series analysis game? Forget everything you know about creating lag variables. There’s a better way, and it’s been right in front of you all along.
This is a good one. We’ll make use of the semi-unknown partial
function to create a useful wrapper around the lag
function. Let’s go straight to the point.
First, we create a new function called map_lag
. This function is essentially a mapped version of the lag
function from dplyr
, where we pre-fill the n
argument to create different lag functions. Then, we can apply this list of functions, each one representing a different lag length, to the desired variable.
And just like that, voila! We have multiple lag variables without breaking a sweat. To make things even better, we can change the names of our newly created lag variables on the fly to make them more meaningful.
calculate_lags <- function(df, var, lags){ map_lag <- lags %>% map(~partial(lag, n = .x)) return(df %>% mutate(across(.cols = {{var}}, .fns = map_lag, .names = "{.col}_lag{lags}"))) }
Let’s see a quick example. We’ll be using the closing prices of the TSLA stock to showcase its use. We have a data frame like this:
tsla %>% head(4) ## # A tibble: 4 × 6 ## date open high low close volume ## <date> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 2022-01-03 383. 400. 379. 400. 104686047 ## 2 2022-01-04 397. 403. 374. 383. 100248258 ## 3 2022-01-05 382. 390. 360. 363. 80119797 ## 4 2022-01-06 359 363. 340. 355. 90336474
We simply pass the desired lags to the function, as well as the column we will apply the lags on. Note that we are also using tidyevaluation to reference the column without quotes. This way we keep the tidyverse vibe intact.
tsla %>% calculate_lags(close, 1:3) %>% head() ## # A tibble: 6 × 9 ## date open high low close volume close_lag1 close_lag2 close_lag3 ## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 2022-01-03 383. 400. 379. 400. 104686047 NA NA NA ## 2 2022-01-04 397. 403. 374. 383. 100248258 400. NA NA ## 3 2022-01-05 382. 390. 360. 363. 80119797 383. 400. NA ## 4 2022-01-06 359 363. 340. 355. 90336474 363. 383. 400. ## 5 2022-01-07 360. 360. 337. 342. 84164748 355. 363. 383. ## 6 2022-01-10 333. 353. 327. 353. 91814877 342. 355. 363.
It’s time to create your own lags like a pro. Embrace the power of purrr
and partial
and take your time series analysis to the next level. You will impress your colleagues with your advanced R skills and will have more time to focus on the real analysis.
Short and sweet!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.