[This article was first published on cynkra, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The tsbox package provides a set of
tools that are agnostic towards existing time series classes. The tools also allow you to handle time series as plain data frames, thus making it easy to deal with time series in a dplyr or data.table workflow.
Version 0.3.1 is now on CRAN and provides several bugfixes and extensions (see here for the full change log).
A detailed overview of the package functionality
is given in the documentation page (or in an
older
blog-post).
New and extended functionality
ts_frequency(): changes the frequency of a time series. It is now possible to aggregate any time series to years, quarters, months, weeks, days, hours, minutes or seconds. For low- to high-frequency conversion, the tempdisagg package can now convert low frequency to high frequency and has support for ts-boxable objects.
E.g.:
library(tsbox)
x <- ts_tbl(EuStockMarkets)
x
#> # A tibble: 7,440 × 3
#> id time value
#> <chr> <dttm> <dbl>
#> 1 DAX 1991-07-01 03:18:27 1629.
#> 2 DAX 1991-07-02 13:01:32 1614.
#> 3 DAX 1991-07-03 22:44:38 1607.
#> 4 DAX 1991-07-05 08:27:43 1621.
#> 5 DAX 1991-07-06 18:10:48 1618.
#> # … with 7,435 more rows
ts_frequency(x, "week")
#> # A tibble: 1,492 × 3
#> id time value
#> <chr> <date> <dbl>
#> 1 DAX 1991-06-30 1618.
#> 2 DAX 1991-07-07 1633.
#> 3 DAX 1991-07-14 1632.
#> 4 DAX 1991-07-21 1620.
#> 5 DAX 1991-07-28 1616.
#> # … with 1,487 more rows
ts_index(): returns an indexed series, with a value of 1 at the base period.
This base period can now be specified more flexibly. E.g., the average of a year can defined as 1 (which is a common use case).
ts_na_interpolation(): A new function that wraps imputeTS::na_interpolation() from the imputeTS package and allows the imputation of missing values for any time series object.
ts_first_of_period(): A new function that replaces the date or time value by the first of the period. This is useful because tsbox usually relies on timestamps being the first of a period. The following monthly series has an offset of 14 days. ts_first_of_period() changes the timestamp to the first date of each month:
x <- ts_lag(ts_tbl(mdeaths), "14 days")
x
#> # A tibble: 72 × 2
#> time value
#> <date> <dbl>
#> 1 1974-01-15 2134
#> 2 1974-02-15 1863
#> 3 1974-03-15 1877
#> 4 1974-04-15 1877
#> 5 1974-05-15 1492
#> # … with 67 more rows
ts_first_of_period(x)
#> # A tibble: 72 × 2
#> time value
#> <date> <dbl>
#> 1 1974-01-01 2134
#> 2 1974-02-01 1863
#> 3 1974-03-01 1877
#> 4 1974-04-01 1877
#> 5 1974-05-01 1492
#> # … with 67 more rows
Because this works reliably, it is easy to define a toolkit that works for
all classes. So, whether we want to smooth, scale,
differentiate, chain, forecast, regularize, impute or
seasonally adjust a time series, we can use the same commands to
whatever time series class at hand:
ts_trend(x.ts) # estimate a trend line
ts_pc(x.xts) # calculate percentage change rates (period on period)
ts_pcy(x.df) # calculate percentage change rates (year on year)
ts_lag(x.dt) # lagged series
There are many more.
Because they all start with ts_, you can use auto-complete to see
what’s around. Most conveniently, there is a time series plot function
that works for all classes and frequencies:
ts_plot(
`Airline Passengers` = AirPassengers,
`Lynx trappings` = ts_tis(lynx),
`Deaths from Lung Diseases` = ts_xts(fdeaths),
title = "Airlines, trappings, and deaths",
subtitle = "Monthly passengers, annual trappings, monthly deaths"
)
To leave a comment for the author, please follow the link and comment on their blog: cynkra.