Site icon R-bloggers

Grouped Sequences in dplyr Part 2

[This article was first published on R on Alan Yeung, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I just wrote a post about grouped sequences in dplyr and following that, I’ve been made aware of another couple of solutions to this problem (credit John Mackintosh). The solution involves using the consecutive_id() function, available in dplyr since v1.1.0. In the help page for this function, it’s mentioned that it was inspired by rleid() function from the data.table package. These functions work similarly to the rle() function I used last time (in what I called ‘the complicated solution’) but provide neater outputs. I’ll try them both out here.

library(tidyverse)

sports_df <- tibble::tribble(
  ~team,    ~date,        ~home_or_away, ~num_matches_away,
  "Team A", "07/10/2022", "H",           0L,
  "Team A", "14/10/2022", "A",           1L,
  "Team A", "21/10/2022", "A",           2L,
  "Team A", "28/10/2022", "H",           0L,
  "Team A", "04/11/2022", "A",           1L,
  "Team A", "11/11/2022", "A",           2L,
  "Team A", "18/11/2022", "A",           3L,
  "Team A", "25/11/2022", "H",           0L,
  "Team A", "02/12/2022", "H",           0L,
  "Team A", "09/12/2022", "H",           0L,
  "Team B", "07/10/2022", "H",           0L,
  "Team B", "14/10/2022", "A",           1L,
  "Team B", "21/10/2022", "A",           2L,
  "Team B", "28/10/2022", "A",           3L,
  "Team B", "04/11/2022", "H",           0L,
  "Team B", "11/11/2022", "A",           1L,
  "Team B", "18/11/2022", "A",           2L,
  "Team B", "25/11/2022", "H",           0L,
  "Team B", "02/12/2022", "H",           0L,
  "Team B", "09/12/2022", "A",           1L
) %>%      
  mutate(date = as.Date(date, "%d/%m/%Y")) %>% 
  arrange(team, date)

dplyr::consecutive_id()

sports_df2 <- sports_df %>% 
  group_by(team) %>% 
  mutate(consec_id = consecutive_id(home_or_away)) %>% 
  group_by(team, consec_id) %>% 
  mutate(days_on_road = 1:n(),
         days_on_road = if_else(home_or_away == "H", 0L, days_on_road)) %>% 
  ungroup()

sports_df2
## # A tibble: 20 × 6
##    team   date       home_or_away num_matches_away consec_id days_on_road
##    <chr>  <date>     <chr>                   <int>     <int>        <int>
##  1 Team A 2022-10-07 H                           0         1            0
##  2 Team A 2022-10-14 A                           1         2            1
##  3 Team A 2022-10-21 A                           2         2            2
##  4 Team A 2022-10-28 H                           0         3            0
##  5 Team A 2022-11-04 A                           1         4            1
##  6 Team A 2022-11-11 A                           2         4            2
##  7 Team A 2022-11-18 A                           3         4            3
##  8 Team A 2022-11-25 H                           0         5            0
##  9 Team A 2022-12-02 H                           0         5            0
## 10 Team A 2022-12-09 H                           0         5            0
## 11 Team B 2022-10-07 H                           0         1            0
## 12 Team B 2022-10-14 A                           1         2            1
## 13 Team B 2022-10-21 A                           2         2            2
## 14 Team B 2022-10-28 A                           3         2            3
## 15 Team B 2022-11-04 H                           0         3            0
## 16 Team B 2022-11-11 A                           1         4            1
## 17 Team B 2022-11-18 A                           2         4            2
## 18 Team B 2022-11-25 H                           0         5            0
## 19 Team B 2022-12-02 H                           0         5            0
## 20 Team B 2022-12-09 A                           1         6            1
identical(sports_df2$days_on_road, sports_df$num_matches_away)
## [1] TRUE

data.table::rleid()

sports_df3 <- sports_df %>% 
  group_by(team) %>% 
  mutate(rleid_id = data.table::rleid(home_or_away)) %>% 
  group_by(team, rleid_id) %>% 
  mutate(days_on_road = 1:n(),
         days_on_road = if_else(home_or_away == "H", 0L, days_on_road)) %>% 
  ungroup()

sports_df3
## # A tibble: 20 × 6
##    team   date       home_or_away num_matches_away rleid_id days_on_road
##    <chr>  <date>     <chr>                   <int>    <int>        <int>
##  1 Team A 2022-10-07 H                           0        1            0
##  2 Team A 2022-10-14 A                           1        2            1
##  3 Team A 2022-10-21 A                           2        2            2
##  4 Team A 2022-10-28 H                           0        3            0
##  5 Team A 2022-11-04 A                           1        4            1
##  6 Team A 2022-11-11 A                           2        4            2
##  7 Team A 2022-11-18 A                           3        4            3
##  8 Team A 2022-11-25 H                           0        5            0
##  9 Team A 2022-12-02 H                           0        5            0
## 10 Team A 2022-12-09 H                           0        5            0
## 11 Team B 2022-10-07 H                           0        1            0
## 12 Team B 2022-10-14 A                           1        2            1
## 13 Team B 2022-10-21 A                           2        2            2
## 14 Team B 2022-10-28 A                           3        2            3
## 15 Team B 2022-11-04 H                           0        3            0
## 16 Team B 2022-11-11 A                           1        4            1
## 17 Team B 2022-11-18 A                           2        4            2
## 18 Team B 2022-11-25 H                           0        5            0
## 19 Team B 2022-12-02 H                           0        5            0
## 20 Team B 2022-12-09 A                           1        6            1
identical(sports_df3$days_on_road, sports_df$num_matches_away)
## [1] TRUE

We now have 4 solutions to this problem – it’s often the case that there are many ways to do the same thing in R! It can be extra effort to remember different solutions but they work in different ways and can be applied to different problems so I do think it’s worthwhile trying these out. Which solution is your favourite?

To leave a comment for the author, please follow the link and comment on their blog: R on Alan Yeung.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version