Time Series Clustering in R

Posted on July 3, 2024 by Zahier Nasrudin in R bloggers | 0 Comments

[This article was first published on ZAHIER NASRUDIN, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Purpose of the tutorial: To demonstrate a quick and straightforward implementation of time series clustering using the widyr package in R
What is time series clustering?: Grouping time series data into clusters where data points in the same cluster group are more similar to each other than to those in other clusters. For example, if we have monthly sales data, time series clustering can help identify stores with similar sales patterns over time.

Load library

Code

library(tidyverse)
library(widyr)

Import data

About the data:
- Fake dataset that can be downloaded from my GitHub.
- Contains 832 rows & 3 columns
  - Columns:
    - year (<date>): Date information for each observation.
    - storecode (<chr>): Unique identifier for each store.
    - sales (<dbl>): Sales figures for each store.
Importing data: Using read_csv()

Code

store_list <- read_csv("https://raw.githubusercontent.com/zahiernasrudin/datasets/main/sample_store.csv")

Glimpse of the dataset:

year	storecode	sales
2022-12-01	A4P1Q1	22432
2023-01-01	A4P1Q1	22425
2023-02-01	A4P1Q1	20710
2023-03-01	A4P1Q1	23054
2023-04-01	A4P1Q1	23912
2023-05-01	A4P1Q1	22782

Clustering with `widyr`

Using widely_kmeans for time series clustering:

Code

# Perform k-means clustering using widely_kmeans
cluster_group <-  store_list %>%
  widely_kmeans(item = storecode, 
                feature = year, 
                value = sales,
                k = 3)

# Join the clustering results back to the original data
store_list_with_cluster <- left_join(store_list, cluster_group)

Define item:
- Description: Item to cluster. In the context of our dataset, this would be the storecode

Define feature:
- Description: Feature column (dimension in clustering). In our case, the feature is the time component, which is represented by year column
Define value:
- Description: Value column. In our dataset, this would be the sales
Define k:
- Description: Number of clusters. This should be chosen based on the specific requirements of your analysis or determined using evaluation metrics. For the sake of simplicity in this tutorial, we will use 3 clusters.
Joining Results: The clustering results are joined back to the original dataset.

Evaluating Clustering Results

We can visualize the clustering results using ggplot2.

Code

library(ggthemes)

store_list_with_cluster |> 
  ggplot(aes(x = year, y = sales, group = storecode, colour = cluster)) +
  geom_line(show.legend = F) +
  scale_y_continuous(labels = scales::comma) +
  facet_wrap(vars(cluster)) +
  scale_color_solarized()

There you have it, a simple way to implement time series clustering using the widyr package in R. Of course, there is much more you can explore and refine in your clustering analysis. For comprehensive documentation and further exploration of the widyr package, visit the widyr page itself: widyr Documentation.

To leave a comment for the author, please follow the link and comment on their blog: ZAHIER NASRUDIN.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Time Series Clustering in R

Introduction

Load library

Import data

Clustering with `widyr`

Evaluating Clustering Results

Related

Introduction

Load library

Import data

Clustering with widyr

Evaluating Clustering Results

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Clustering with `widyr`

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)