Testing stationarity with the ts_adf_test() function in R

Steven P. Sanderson II, MPH

14 hours ago

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

< section id="introduction" class="level1">

Introduction

Hey there, R enthusiasts! Today, we’re going to dive into the fascinating world of time series analysis using the ts_adf_test() function from the healthyR.ts R library. If you’re into data, statistics, and R coding, this is a must-know tool for your arsenal.

< section id="whats-the-deal-with-augmented-dickey-fuller" class="level1">

What’s the Deal with Augmented Dickey-Fuller?

Before we delve into the ts_adf_test() function, let’s understand the concept behind it. The Augmented Dickey-Fuller (ADF) test is a crucial tool in time series analysis. It’s like the Sherlock Holmes of time series data, helping us detect whether a series is stationary or not. Stationarity is a fundamental assumption in time series modeling because many models work best when applied to stationary data.

So, why “Augmented”? Well, it’s an extension of the original Dickey-Fuller test that accounts for more complex relationships within the time series data.

< section id="the-ts_adf_test-function" class="level1">

The `ts_adf_test()` Function

Now, let’s get to the star of the show, the ts_adf_test() function. This function is part of the healthyR.ts library, and its primary job is to perform the ADF test on a given time series. In R, a time series can be represented as a numeric vector. Here’s the basic syntax:

ts_adf_test(.x, .k = NULL)

.x is your time series data, the numeric vector you want to analyze.
.k is an optional parameter that allows you to specify the lag order. If you leave it empty (like .k = NULL), don’t worry; the function will calculate it for you based on the number of observations using a clever formula.

< section id="show-me-the-stats" class="level1">

Show Me the Stats!

So, what does ts_adf_test() return? It gives you a list object containing two vital pieces of information:

Test Statistic: This is the heart of the ADF test. It tells us how strongly our data deviates from being stationary. A more negative value indicates stronger evidence for stationarity.
P-Value: This is another critical number. It represents the probability that you’d observe a test statistic as extreme as the one you obtained if the data were not stationary. In simpler terms, a low p-value suggests that your data is likely stationary, while a high p-value implies non-stationarity.

< section id="lets-get-practical" class="level1">

Let’s Get Practical

Enough theory! Let’s see some action with a couple of examples. Say we have the AirPassengers and BJsales datasets, and we want to check their stationarity:

library(healthyR.ts)

# ADF test for AirPassengers
result_air <- ts_adf_test(AirPassengers)
cat("AirPassengers ADF Test Result:\n")

AirPassengers ADF Test Result:

print(result_air)

$test_stat
[1] -7.318571

$p_value
[1] 0.01

# ADF test for BJsales
result_bj <- ts_adf_test(BJsales)
cat("\nBJsales ADF Test Result:\n")

BJsales ADF Test Result:

print(result_bj)

$test_stat
[1] -2.110919

$p_value
[1] 0.5301832

In the AirPassengers example, we get a test statistic of -7.318571 and a p-value of 0.01. This suggests strong evidence for stationarity in this dataset.

However, for BJsales, we get a test statistic of -2.110919 and a p-value of 0.5301832. The higher p-value here indicates that the data is less likely to be stationary.

Now let’s see what happens when we change the lags of the series by one period.

ts_adf_test(AirPassengers, 1)

$test_stat
[1] -7.652287

$p_value
[1] 0.01

ts_adf_test(BJsales, 1)

$test_stat
[1] -1.316414

$p_value
[1] 0.8611925

< section id="conclusion" class="level1">

Conclusion

The ts_adf_test() function in the healthyR.ts library is a valuable tool for any data scientist or R coder working with time series data. It helps you determine whether your data is stationary, a crucial step in building reliable time series models.

So, the next time you’re faced with a time series dataset, remember to call on your trusty companion, ts_adf_test(), to solve the mystery of stationarity. Happy coding, R enthusiasts!

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.