Time Series in 5-Minutes, Part 5: Anomaly Detection
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Have 5-minutes? Then let’s learn time series. In this short articles series, I highlight how you can get up to speed quickly on important aspects of time series analysis. Today we are focusing analyzing anomalies in time series data.
Updates
This article has been updated. View the updated Time Series in 5-Minutes article at Business Science.
Time Series in 5-Mintues
Articles in this Series
I just released timetk
2.0.0 (read the release announcement). A ton of new functionality has been added. We’ll discuss some of the key pieces in this article series:
- Part 1, Data Wrangling and Rolling Calculations
- Part 2, The Time Plot
- Part 3, Autocorrelation
- Part 4, Seasonality
- Part 5, Anomalies and Anomaly Detection
- Part 6, Dealing with Missing Time Series Data
???? Register for our blog to get new articles as we release them.
Have 5-Minutes?
Then let’s learn Time Series Anomaly Detection
Anomaly detection is an important part of time series analysis:
- Detecting anomalies can signify special events
- Cleaning anomalies can improve forecast error
In this short tutorial, we will cover the plot_anomaly_diagnostics()
and tk_anomaly_diagnostics()
functions for visualizing and automatically detecting anomalies at scale.
Advanced Time Series Course
Become the times series domain expert in your organization.
Make sure you’re notified when my new Advanced Time Series Forecasting in R course comes out. You’ll learn timetk
and modeltime
plus the most powerful time series forecasting techiniques available. Become the times series domain expert in your organization.
???? Get notified here: Advanced Time Series Course.
You will learn:
- Time Series Preprocessing, Noise Reduction, & Anomaly Detection
- Feature engineering using lagged variables & external regressors
- Hyperparameter tuning
- Time series cross-validation
- Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
- NEW – Deep Learning with RNNs (Competition Winner)
- and more.
Signup for the Time Series Course waitlist
Let’s Get Started
First setup the libraries we’ll use:
Data
This tutorial will use the walmart_sales_weekly
dataset:
- Weekly
- Sales spikes at various events
Automatic Anomaly Detection
To get the data on the anomalies, we use tk_anomaly_diagnostics()
, the preprocessing function.
The tk_anomaly_diagnostics()
method for anomaly detection implements a 2-step process to detect outliers in time series.
Step 1: Detrend & Remove Seasonality using STL Decomposition
The decomposition separates the “season” and “trend” components from the “observed” values leaving the “remainder” for anomaly detection.
The user can control two parameters: frequency and trend.
.frequency
: Adjusts the “season” component that is removed from the “observed” values..trend
: Adjusts the trend window (t.window parameter fromstats::stl()
that is used.
The user may supply both .frequency and .trend as time-based durations (e.g. “6 weeks”) or numeric values (e.g. 180) or “auto”, which predetermines the frequency and/or trend based on the scale of the time series using the tk_time_scale_template()
.
Step 2: Anomaly Detection
Once “trend” and “season” (seasonality) is removed, anomaly detection is performed on the “remainder”. Anomalies are identified, and boundaries (recomposed_l1 and recomposed_l2) are determined.
The Anomaly Detection Method uses an inner quartile range (IQR) of +/-25 the median.
IQR Adjustment, alpha parameter
With the default alpha = 0.05, the limits are established by expanding the 25/75 baseline by an IQR Factor of 3 (3X). The IQR Factor = 0.15 / alpha (hence 3X with alpha = 0.05):
- To increase the IQR Factor controlling the limits, decrease the alpha, which makes it more difficult to be an outlier.
- Increase alpha to make it easier to be an outlier.
- The IQR outlier detection method is used in
forecast::tsoutliers()
. - A similar outlier detection method is used by Twitter’s AnomalyDetection package.
- Both Twitter and Forecast tsoutliers methods have been implemented in Business Science’s anomalize package.
Anomaly Visualization
Using the plot_anomaly_diagnostics()
function, we can interactively detect anomalies at scale.
The plot_anomaly_diagnostics()
is a visualtion wrapper for tk_anomaly_diagnostics()
group-wise anomaly detection, implementing the 2-step process from above.
Advanced Time Series Course
Become the times series domain expert in your organization.
Make sure you’re notified when my new Advanced Time Series Forecasting in R course comes out. You’ll learn timetk
and modeltime
plus the most powerful time series forecasting techiniques available. Become the times series domain expert in your organization.
???? Get notified here: Advanced Time Series Course.
You will learn:
- Time Series Preprocessing, Noise Reduction, & Anomaly Detection
- Feature engineering using lagged variables & external regressors
- Hyperparameter tuning
- Time series cross-validation
- Ensembling Multiple Machine Learning & Univariate Modeling Techniques (Competition Winner)
- NEW – Deep Learning with RNNs (Competition Winner)
- and more.
Signup for the Time Series Course waitlist
Have questions on using Timetk for time series?
Make a comment in the chat below. ????
And, if you plan on using timetk
for your business, it’s a no-brainer – Join my Time Series Course Waitlist (It’s coming, it’s really insane).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.