Practical Examples with healthyR.ts

Posted on June 19, 2024 by Steven P. Sanderson II, MPH in R bloggers | 0 Comments

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Today I am going to go over some quick yet practical examples of ways that you can use the healthyR.ts package. This package is designed to help you analyze time series data in a more efficient and effective manner.

Let’s just jump right into it!

Load the libraries

library(healthyR.ts)
library(dplyr)
library(ggplot2)
library(tidyr)
library(plotly)
library(timetk)
library(modeltime)

Load the data

We are going to use the timeseries data called BJSales.lead that comes with Base R. We will do this to showcase a couple of things like turning a ts object into a tibble and plotting the data.

# Load the data, which has no time series information other than it is
# a time series object and 150 points in length, so we will go ahead and
# create a date column for it and name it date_col.
df <- BJsales.lead |>
  ts_to_tbl() |>
  mutate(date_col = seq.Date(from = as.Date("1991-01-01"), 
                              by = "month", 
                              length.out = 150)) |>
  select(date_col, everything())

# Print the first few rows of the data
head(df)

# A tibble: 6 × 2
  date_col   value
  <date>     <dbl>
1 1991-01-01 10.0 
2 1991-02-01 10.1 
3 1991-03-01 10.3 
4 1991-04-01  9.75
5 1991-05-01 10.3 
6 1991-06-01 10.1

So far, we have loaded the data and created a date column for it. Now, let’s plot the data. We are going to use the ts_vva_plot function to do this.

# Plot the data
plt_data <- ts_vva_plot(df, date_col, value)

head(plt_data[["data"]][["augmented_data_tbl"]])

# A tibble: 6 × 3
  date_col   name           value
  <date>     <fct>          <dbl>
1 1991-01-01 Value        10.0   
2 1991-01-01 Velocity     NA     
3 1991-01-01 Acceleration NA     
4 1991-02-01 Value        10.1   
5 1991-02-01 Velocity      0.0600
6 1991-02-01 Acceleration NA

plt_data[["plots"]][["interactive_plot"]]

Now we have created the augmented data that gets the first order difference of the time series velocity and then the second order difference which gets us the acceleration. The function then creates a ggplot2 plot and a plotly plot of the data. Let’s move on to see the growth rate of this data.

# Plot the growth rate of the data
df_growth_augment_tbl <- ts_growth_rate_augment(
  df,
  value
)

head(df_growth_augment_tbl)

# A tibble: 6 × 3
  date_col   value growth_rate_value
  <date>     <dbl>             <dbl>
1 1991-01-01 10.0             NA    
2 1991-02-01 10.1              0.599
3 1991-03-01 10.3              2.48 
4 1991-04-01  9.75            -5.52 
5 1991-05-01 10.3              5.95 
6 1991-06-01 10.1             -1.94

Let’s now view the data:

plt <- df_growth_augment_tbl |>
  pivot_longer(cols = -date_col) |>
  ggplot(aes(x = date_col, y = value, color = name)) +
  facet_wrap(~ name, ncol = 1, scales = "free") +
  geom_line() +
  theme_minimal() +
  labs(
    x = "Date",
    y = "Value",
    title = "Growth Rate of Time Series Data",
    color = "Variable"
  )

print(plt)

ggplotly(plt)

Stationary?

Is the data stationary? Meaning does the joint probability of the distribution change when shifted in time? Let’s find out.

ts_adf_test(df[["value"]])

$test_stat
[1] -1.723664

$p_value
[1] 0.6915227

The p-value from this test is 0.692. This means that we can accept the null hypothesis that the data is non-stationary. We can, however, make the data stationary by using a built in function in this package.

auto_stationary_df <- auto_stationarize(df[["value"]])

The time series is not stationary. Attempting to make it stationary...

stationary_vec <- auto_stationary_df[["stationary_ts"]]
ndiffs <- auto_stationary_df[["ndiffs"]]
trans_type <- auto_stationary_df[["trans_type"]]
test_stat <- auto_stationary_df[["adf_stats"]][["test_stat"]]
p_value <- auto_stationary_df[["adf_stats"]][["p_value"]]

The data is now stationary after 1 differencing. The transformation type used was diff. The test statistic was -4.839 and the p-value was 0.01.

Let’s now add the stationary data to the df_growth_augment_tbl and plot it. First in order to do this we are going to have to pad the data since it is shorter than the original data. We will simply add an NA to the vector then attach.

stationary_vec <- c(rep(NA, ndiffs), stationary_vec)
df_growth_augment_tbl <- df_growth_augment_tbl |>
  mutate(stationary = stationary_vec)

df_growth_augment_tbl |>
  pivot_longer(cols = -date_col) |>
  ggplot(aes(x = date_col, y = value, color = name)) +
  facet_wrap(~ name, ncol = 1, scales = "free") +
  geom_line() +
  theme_minimal() +
  labs(
    x = "Date",
    y = "Value",
    title = "Growth Rate/Value and Stationary Data of Time Series",
    color = "Variable"
  )

It’s close to the growth rate as it is the first order difference of the data.

Now, lets see if there is any lags that are present in the data.

output <- ts_lag_correlation(df_growth_augment_tbl,
                .date_col = date_col,
                .value_col = value,
                .lags = c(1,2,3,4,6,12,24))

output[["plots"]][["plotly_lag_plot"]]

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Load the libraries

Load the data

Stationary?

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)