Site icon R-bloggers

Pledging My Time V: analysing race results in R

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It’s been a while since I posted a breakdown of half marathon times. The last time seems to have been 2018. I decided to give my old code a clean-up and quickly crunched the numbers from the 2022 Kenilworth Half Marathon.

First, the results:

Briefly, the code below reads in a csv file of race results downloaded from the provider. A little bit of wrangling is required to make them plotting-friendly and then six ggplots can be made to look at pace, time and speed; with a breakdown down by category or by gender. OK, everyone runs the same distance on the same course, so speed, pace and time are three ways of looking at the same thing… how fast everyone ran the race.

require(tidyverse)
require(ggbeeswarm)
file_name <- file.choose()
df1 <- read.csv(file_name, header = TRUE, stringsAsFactors = FALSE)
# aggregate M and F to a new category called Gender
df1$Gender <- ifelse(startsWith(df1$Category,"F"),"F","M")
# format Date column to POSIXct
df1$Time <- as.POSIXct(strptime(df1$Time, format = "%H:%M:%S"))
# if Time doesn't exist, use Net.Time
df1$Time <- as.POSIXct(strptime(df1$Net.Time, format = "%H:%M:%S"))
orig_var <- as.POSIXct("00:00:00", format = "%H:%M:%S")
p1 <- ggplot( data = df1, aes(x = Category,y = Time, color = Category)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun = mean, geom = "point", size=2, colour = "black", alpha = 0.5) +
  scale_y_datetime(date_labels = "%H:%M:%S", limits = c(orig_var,NA)) +
  theme(legend.position = "none")
# instead of finishing time, let's look at pace (min/km)
df1$Pace <- as.numeric(difftime(df1$Time, orig_var) / 21.1) * 3600
df1$Pace <- as.POSIXct(df1$Pace, origin = orig_var, format = "%H:%M:%S")
p2 <- ggplot( data = df1, aes(x = Category,y = Pace, color = Category)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun = mean, geom = "point", size=2, colour = "black", alpha = 0.5) +
  scale_y_datetime(date_labels = "%M:%S", limits = c(orig_var,NA)) +
  theme(legend.position = "none")
# calculate speeds rather than pace
df1$Speed <- 21.1 / as.numeric(difftime(df1$Time, orig_var))
p3 <- ggplot( data = df1, aes(x = Category, y = Speed, color = Category)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun = mean, geom = "point", size=2, colour = "black", alpha = 0.5) +
  ylim(0,NA) + ylab("Speed (km/h)") +
  theme(legend.position = "none")
# now make the same plots but by Gender rather than Category
p4 <- ggplot( data = df1, aes(x = Gender,y = Time, color = Gender)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun = mean, geom = "point", size=2, colour = "black", alpha = 0.5) +
  scale_y_datetime(date_labels = "%H:%M:%S", limits = c(orig_var,NA)) +
  theme(legend.position = "none")
p5 <- ggplot( data = df1, aes(x = Gender,y = Pace, color = Gender)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun = mean, geom = "point", size=2, colour = "black", alpha = 0.5) +
  scale_y_datetime(date_labels = "%M:%S", limits = c(orig_var,NA)) +
  theme(legend.position = "none")
p6 <- ggplot( data = df1, aes(x = Gender, y = Speed, color = Gender)) + 
  geom_quasirandom(alpha = 0.5, stroke = 0) +
  stat_summary(fun = mean, geom = "point", size=2, colour = "black", alpha = 0.5) +
  ylim(0,NA) + ylab("Speed (km/h)") +
  theme(legend.position = "none")

At the end of this script we have six ggplot objects labelled p1 to p6.

What about me?

You can run this code on race results from an event you’ve participated in. But what you really want to know is how did one runner (yourself) compare to everyone else?!

Using this simple function we can relabel the plots we have made to highlight one runner:

# discretely highlight person of interest by bib number
label_plot <- function(plot, raceno) {
  plot <- plot + geom_point(data = df1[df1$Race.No == raceno,], colour = "dark grey")
  return(plot)
}

# for example if we are interested in runner with bib number (raceno) 344
p1 <- label_plot(p1,344)
p2 <- label_plot(p2,344)
# and so on, save with ggsave

Now we have a discrete point highlighting the position of the runner of interest. All plots are shown above, one of these is shown as an example here:

Race time by category

Often race results will show your ranked position i) overall, ii) by gender and iii) by category. Your ranked position doesn’t mean much though if you don’t know the sizes of those fields. For example, “you came 10th overall” sounds great, but what if there were only 10 runners. What we want to know is our percentile.

This function prints out your rank, the size of the field and your percentile.

lookup_runner <- function(df, raceno) {
  
  orig_row <- which(df$Race.No == raceno)
  
  all_df <- df %>%
    mutate(rank = rank(Net.Time))
  
  rankno <- all_df$rank[all_df$Race.No == raceno]
  denom <- nrow(all_df)
  cat("Runner",raceno,"is",rankno,"out of",denom,"Runners",":",rankno/denom*100,"%ile\n")
  
  gender_df <- df %>%
    arrange(Gender, Net.Time) %>% 
    group_by(Gender) %>% 
    mutate(rank = rank(Net.Time))
  
  rankno <- gender_df$rank[gender_df$Race.No == raceno]
  key <- df$Gender[orig_row]
  denom <- length(which(gender_df$Gender == key))
  cat("Runner",raceno,"is",rankno,"out of",denom,key,":",rankno/denom*100,"%ile\n")
  
  cat_df <- df %>%
    arrange(Category, Net.Time) %>% 
    group_by(Category) %>% 
    mutate(rank = rank(Net.Time))
  
  rankno <- cat_df$rank[cat_df$Race.No == raceno]
  key <- df$Category[orig_row]
  denom <- length(which(gender_df$Category == key))
  cat("Runner",raceno,"is",rankno,"out of",denom,key,":",rankno/denom*100,"%ile\n")
}

Here is a typical output:

> lookup_runner(df1,344)
Runner 344 is 111 out of 611 Runners : 18.16694 %ile
Runner 344 is 102 out of 414 M : 24.63768 %ile
Runner 344 is 20 out of 105 MV45 : 19.04762 %ile

Technical footnote 1: the csv that can be downloaded from racetecresults is not readable by R. I generated my own from the xls file that is also available.

Technical footnote 2: if you run this using results from other providers, you may need to edit some of the headers in the file, or column names in the code. The headers should be “Pos” “Race.No” “Name” “Net.Time” “Category” “Cat.Pos” “Gender” “Gen.Pos” “Club”, with Race.No, Category, and Net.Time being needed to run the code.

The post title is taken from “Pledging My Time” a track from Blonde on Blonde by Bob Dylan.

To leave a comment for the author, please follow the link and comment on their blog: Rstats – quantixed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.