Handling Splits with SwimmeR
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Splits are generally reported in one of two formats, cumulative or lap. When working with data I find lap format to be more useful, but what’s most useful is to have all splits in the same format. This post discusses how to do just that with data from swimming and track. First step is to make sure you have the most recent versions of SwimmeR
and JumpeR
installed. SwimmeR
is available from CRAN, JumpeR
is from github.
install.packages("SwimmeR") devtools::install_github("gpilgrim2670/JumpeR")
This post will consist of a demonstration of the tools available in SwimmeR
for converting between split formats and their applicability to swimming and track data.
library(SwimmeR) library(JumpeR) library(dplyr) library(flextable) flextable_style <- function(x) { x %>% flextable() %>% bold(part = "header") %>% # bolds header bg(bg = "#D3D3D3", part = "header") %>% # puts gray background behind the header row autofit() }
Split Formats
Cumulative splits accumulate over the duration of an event. Say an athlete was clocked at 30.00 seconds for her first 50 yards. That 30.00 seconds is her 50 split. If the clock keeps running, and she’s clocked at 1:05.00 at the 100y mark that 1:05.00 is her cumulative 100 split. It contains the 30.00 50 split inside it. Her lap 100y split is 1:05.00 minus 30.00, which is 35.00. Lap splits are generally preferred, because they’re more specific. Rather than containing information about the entire race to a given point they only contain information about one specific component (lap) of a race.
Results with only cumulative splits do exist. Luckily the SwimmeR
package contains functions to convert between the two types.
Cumulative Splits to Lap Splits
link <- "https://swimswam.com/wp-content/uploads/2019/03/D3.NCAA-2005.pdf" df <- link %>% SwimmeR::read_results() %>% swim_parse(splits = TRUE, avoid = c("QUALIFYING", "NCAA")) df_demo <- df %>% filter(Event == "WOMEN's 200 Yard BUTTERFLY") %>% filter(is.na(Name) == FALSE) %>% head(3) %>% select(Name, Finals_Time, contains("Split")) %>% select(where(~!all(is.na(.x)))) df_demo %>% flextable_style() %>% set_caption("Raw Results, Cumulative Splits")
Name | Finals_Time | Split_50 | Split_100 | Split_150 | Split_200 |
KEPHART, SAMANTHA | 2:04.16 | 28.46 | 59.91 | 1:32.09 | 2:04.16 |
ROSS, MARIKA | 2:05.13 | 28.39 | 1:00.59 | 1:33.10 | 2:05.13 |
WILLIAMSON, VANESSA | 2:05.57 | 28.75 | 1:00.68 | 1:33.08 | 2:05.57 |
These splits, from the 2005 DIII NCAA championships are cumulative, which is not ideal. Enter SwimmeR
’s splits_to_lap
function, which uhhh, converts splits to lap format.
df_demo %>% splits_to_lap() %>% flextable_style() %>% set_caption("Splits Converted to Lap Format")
Name | Finals_Time | Split_50 | Split_100 | Split_150 | Split_200 |
KEPHART, SAMANTHA | 2:04.16 | 28.46 | 31.45 | 32.18 | 32.07 |
ROSS, MARIKA | 2:05.13 | 28.39 | 32.20 | 32.51 | 32.03 |
WILLIAMSON, VANESSA | 2:05.57 | 28.75 | 31.93 | 32.40 | 32.49 |
Of course if you’re some kind of sicko and like cumulative splits SwimmeR
is also (begrudgingly) here for you with the splits_to_cumulative
function. Here’s splits_to_cumulative
undoing all the good work of splits_to_lap
.
df_demo %>% splits_to_lap() %>% splits_to_cumulative() %>% flextable_style() %>% set_caption("Splits Converted back to Cumulative Format")
Name | Finals_Time | Split_50 | Split_100 | Split_150 | Split_200 |
KEPHART, SAMANTHA | 2:04.16 | 28.46 | 59.91 | 1:32.09 | 2:04.16 |
ROSS, MARIKA | 2:05.13 | 28.39 | 1:00.59 | 1:33.10 | 2:05.13 |
WILLIAMSON, VANESSA | 2:05.57 | 28.75 | 1:00.68 | 1:33.08 | 2:05.57 |
Data Frames with Mixed Cumulative and Lap Splits
Here at Swimming + Data Science we often assemble data frames from multiple meets. That means that some splits in a given data frame could be in cumulative format, while others are in lap format. How can we deal with a mixed format data frame? Well, here’s a example data frame with data from two swimmers, one with lap format splits and the other with cumulative.
df_mixed <- data.frame( Place = 1, Name = c("Lenore Lap", "Casey Cumulative"), Team = rep("KVAC", 2), Event = rep("Womens 200 Freestyle", 2), Finals_Time = rep("1:58.00", 2), Split_50 = rep("28.00", 2), Split_100 = c("31.00", "59.00"), Split_150 = c("30.00", "1:29.00"), Split_200 = c("29.00", "1:58.00") ) df_mixed %>% flextable_style() %>% set_caption("Mixed Lap and Cumulative Splits")
Place | Name | Team | Event | Finals_Time | Split_50 | Split_100 | Split_150 | Split_200 |
1 | Lenore Lap | KVAC | Womens 200 Freestyle | 1:58.00 | 28.00 | 31.00 | 30.00 | 29.00 |
1 | Casey Cumulative | KVAC | Womens 200 Freestyle | 1:58.00 | 28.00 | 59.00 | 1:29.00 | 1:58.00 |
In order to convert cumulative splits to lap format, but not interfere with those splits already in lap format it’s necessary to set a parameter called threshold
in splits_to_lap
. Setting threshold
defines a maximum acceptable split value. All splits greater than threshold
will be converted to lap format, and all splits less threshold
will be unchanged. Looking at the table above all of Lenore Lap’s splits are less than 31.01, and all of Casey Cumulative’s cumulative splits are greater than 58.99, so any value between 31.01 and 58.99 will work for threshold
. I’ll use threshold = 35
for this example
df_mixed %>% splits_to_lap(threshold = 35) %>% flextable_style() %>% set_caption("All Splits in Lap Format")
Place | Name | Team | Event | Finals_Time | Split_50 | Split_100 | Split_150 | Split_200 |
1 | Lenore Lap | KVAC | Womens 200 Freestyle | 1:58.00 | 28.00 | 31.00 | 30.00 | 29.00 |
1 | Casey Cumulative | KVAC | Womens 200 Freestyle | 1:58.00 | 28.00 | 31.00 | 30.00 | 29.00 |
Similarly splits_to_cumulatve
also has a threshold
parameter, which serves the same purpose. In splits_to_cumulative
the threshold
parameter is basically a minimum split time. The fastest (i.e. minimum) split in df_mixed
is 28.00, so any value less than 28.00 will work. I’ll use threshold = 27.99
and all splits will be converted to cumulative format.
df_mixed %>% splits_to_cumulative(threshold = 27.99) %>% flextable_style() %>% set_caption("All Splits in Cumulative Format")
Place | Name | Team | Event | Finals_Time | Split_50 | Split_100 | Split_150 | Split_200 |
1 | Lenore Lap | KVAC | Womens 200 Freestyle | 1:58.00 | 28.00 | 59.00 | 1:29.00 | 1:58.00 |
1 | Casey Cumulative | KVAC | Womens 200 Freestyle | 1:58.00 | 28.00 | 59.00 | 1:29.00 | 1:58.00 |
Track and Field
Hadley has the tidyverse, his empire of interconnected packages. I’ve got my two, SwimmeR
and JumpeR
, which together make up the name-pending-verse. Send in your ideas.
A goal for the two packages going forward is to make utility functions, like splits_to_lap
and splits_to_cumulative
work for data gathered with both packages. This is an ongoing goal, and not fully realized, but the split handling functions are a step in the right direction.
Here’s an example of track data read in with JumpeR
. The splits are in cumulative format.
df_track <- "https://www.flashresults.com/2017_Meets/Outdoor/04-29_VirginiaGrandPrix/025-1-01.htm" %>% flash_parse_table(clean = TRUE, wide_format = TRUE) %>% select(Name, Event, contains("Split")) %>% head(3) df_track %>% flextable_style() %>% set_caption("Track Results in Cumulative Format")
Name | Event | Split_300 | Split_700 | Split_1100 | Split_1500 |
Khalil RMIDI KININI | 1500m | 44.81 | 1:46.73 | 2:49.10 | 3:46.70 |
Colin SCHULTZ | 1500m | 44.40 | 1:46.11 | 2:48.28 | 3:47.14 |
Andrew GOLDMAN | 1500m | 44.70 | 1:46.53 | 2:48.36 | 3:47.33 |
Converting these track splits to lap format is done exactly the same way as with swimming results, via SwimmeR::splits_to_lap
.
df_track %>% splits_to_lap() %>% flextable_style() %>% set_caption("Track Results in Lap Format")
Name | Event | Split_300 | Split_700 | Split_1100 | Split_1500 |
Khalil RMIDI KININI | 1500m | 44.81 | 1:01.92 | 1:02.37 | 57.60 |
Colin SCHULTZ | 1500m | 44.40 | 1:01.71 | 1:02.17 | 58.86 |
Andrew GOLDMAN | 1500m | 44.70 | 1:01.83 | 1:01.83 | 58.97 |
In Closing
SwimmeR
now offers functions for regularizing split formats and they’re also applicable to track results collected with JumpeR
. Continued and expanded interoperability between the two packages is a development focus going forward. Thanks for joining us here at Swimming + Data Science
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.