Introducing JumpeR – For Track and Field Data
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Ordinarily posts on Swimming + Data Science have focused on swimming, or sometimes diving. Today though we’re going to visit some of our more gravity-afflicted colleagues and do a bit of cross-training. That’s because following what I’m going to call the SwimmeR
package’s massive success literally several people reached out to me regarding developing a similar package for track and field. That package, called JumpeR
, is now available on CRAN.
You can get your very own copy of this cutting edge sports-data-science package, for free, today!
install.packages("JumpeR") library(JumpeR) library(flextable) library(dplyr) library(ggplot2) flextable_style <- function(x) { x %>% flextable() %>% bold(part = "header") %>% # bolds header bg(bg = "#D3D3D3", part = "header") %>% # puts gray background behind the header row autofit() }
What does JumpeR do?
JumpeR
is very similar to SwimmeR
. They both mostly serve to convert results from human readable documents to machine & human readable data frames in the context of the R
programming environment.
Supported Results Format
JumpeR
currently supports single column Hy-Tek results, like these, and Flash Results .pdf files like these. JumpeR
does not support multi-column Hy-Tek results or Flash .html files. Further details are available in the package readme file.
Examples
A Running Race
Here’s an example, reading in the 2019 Ivy League Championships and looking at the finals of the Women’s 200M Dash
df <- tf_parse( read_results("http://www.leonetiming.com/2020/Indoor/IvyLeague/Results.htm") ) df %>% filter(Event == "Women 200 Meter Dash") %>% group_by(Name, Team) %>% # to remove prelims slice(2) %>% # to remove prelims arrange(Place) %>% # arrange by Place flextable_style()
Place | Name | Age | Team | Finals_Result | Tiebreaker | DQ | Event |
1 | Katina Martin | SO | Harvard | 24.05 | 0 | Women 200 Meter Dash | |
2 | Olivia Okoli | JR | Harvard | 24.44 | 0 | Women 200 Meter Dash | |
3 | Cecil Ene | SR | Penn | 24.52 | 24.511 | 0 | Women 200 Meter Dash |
4 | Elena Brown-Soler | SR | Penn | 24.52 | 24.520 | 0 | Women 200 Meter Dash |
5 | Katie DiFrancesco | JR | Princeton | 24.53 | 0 | Women 200 Meter Dash | |
6 | Libby McMahon | SO | Yale | 25.12 | 0 | Women 200 Meter Dash | |
7 | Isabella Hilditch | SO | Princeton | 40.06 | 0 | Women 200 Meter Dash | |
Kennedy Waite | FR | Brown | DNF | 1 | Women 200 Meter Dash |
Discus, with Flights
But wait, there’s more! Field events, like jumping and throwing, allow athletes to try several times, with each try called a “flight”. Flights can be captured as well. Here’s the Men’s Discus from the 2019 Virginia Grand Prix
df <- tf_parse( read_results("https://www.flashresults.com/2019_Meets/Outdoor/04-27_VirginiaGrandPrix/038-1.pdf"), flights = TRUE ) df %>% flextable_style()
Place | Name | Age | Team | Finals_Result | DQ | Event | Flight_1 | Flight_2 | Flight_3 | Flight_4 | Flight_5 | Flight_6 |
1 | Nicholas EDWARDS | FR | HAMPTON | 49.86m | 0 | Men Discus | X | 47.11 | 45.99 | 47.28 | X | 49.86 |
2 | Michael ALBERT | JR | APP STATE | 48.30m | 0 | Men Discus | 48.30 | 47.16 | 44.96 | X | 45.85 | X |
3 | Joshua HUNTER | SO | HAMPTON | 47.43m | 0 | Men Discus | 31.94 | X | 46.54 | X | 47.43 | X |
4 | Peter KENN | SR | APP STATE | 46.14m | 0 | Men Discus | X | 42.83 | 46.14 | 44.26 | 43.80 | 44.66 |
5 | Asher PRINCE | FR | CHARLOTTE | 45.98m | 0 | Men Discus | X | 45.98 | 44.62 | X | X | X |
6 | Sasha DAJIA | SR | CHARLOTTE | 44.40m | 0 | Men Discus | X | 44.40 | 44.19 | X | 44.08 | 42.04 |
7 | Britton MANN | SR | HIGH POINT | 42.07m | 0 | Men Discus | X | 38.31 | X | 40.49 | X | 42.07 |
8 | Gabriel STAINBACK | SO | HIGH POINT | 39.37m | 0 | Men Discus | 38.53 | 36.94 | 39.37 | – | – | – |
FOUL | Kysheen MYRICK | SO | LIBERTY | FOUL | 1 | Men Discus | X | X | X | |||
FOUL | Tyson JONES | FR | VIRGINIA TECH | FOUL | 1 | Men Discus | X | X | X |
Pole Vault, with Flights and Attempts
JumpeR
can even capture attempts for vertical jumping events, like in these Women’s Pole Vault results from the 2019 Texas A&M Invite. These results do get quite wide, so here they’re cut off at Flight 2.
df <- tf_parse( read_results("https://www.flashresults.com/2019_Meets/Outdoor/04-12_TamuInvite/014-1.pdf"), flights = TRUE, flight_attempts = TRUE ) df %>% select(Place:Flight_2_Attempts) %>% flextable_style()
Place | Name | Age | Team | Finals_Result | DQ | Event | Flight_1 | Flight_1_Attempts | Flight_2 | Flight_2_Attempts |
1 | Caroline BELLOWS | SR | UTSA | 3.88m | 0 | Women Pole Vault | 3.28 | — | 3.43 | O |
2 | Myka STEINBEISSER | FR | ARIZONA STATE | 3.73m | 0 | Women Pole Vault | 3.28 | — | 3.43 | O |
3 | Tommi HINTNAUS | SO | ARIZONA STATE | 3.73m | 0 | Women Pole Vault | 3.28 | — | 3.43 | — |
4 | Erika WILLIS | FR | AIR FORCE | 3.58m | 0 | Women Pole Vault | 3.28 | — | 3.43 | O |
5 | Kylie SWIEKATOWSKI | JR | RICE | 3.58m | 0 | Women Pole Vault | 3.28 | — | 3.43 | XO |
6 | Cameron BOEDEKER | JR | SAM HOUSTON ST. | 3.58m | 0 | Women Pole Vault | 3.28 | — | 3.43 | O |
6 | Kendahl SHUE | JR | TCU | 3.58m | 0 | Women Pole Vault | 3.28 | — | 3.43 | — |
8 | Corey FRIEDENBACH | FR | AIR FORCE | 3.58m | 0 | Women Pole Vault | 3.28 | — | 3.43 | O |
9 | Tysen TOWNSEND | FR | TCU | 3.58m | 0 | Women Pole Vault | 3.28 | — | 3.43 | XXO |
10 | Lauren LABAY | JR | SAM HOUSTON ST. | 3.43m | 0 | Women Pole Vault | 3.28 | — | 3.43 | O |
10 | Margaret LASSALLE | SR | SAM HOUSTON ST. | 3.43m | 0 | Women Pole Vault | 3.28 | — | 3.43 | O |
12 | Emily HARRISON | FR | RICE | 3.43m | 0 | Women Pole Vault | 3.28 | — | 3.43 | XXO |
12 | Frankie PORAMBO | FR | AIR FORCE | 3.43m | 0 | Women Pole Vault | 3.28 | O | 3.43 | XXO |
DNS | Alexandria GRAY | FR | UTSA | DNS | 0 | Women Pole Vault | 3.28 | 3.43 | ||
NH | Hannah SEARBY | SO | TEXAS A&M | NH | 1 | Women Pole Vault | 3.28 | — | 3.43 | XXX |
NH | Jerni SELF | SR | AIR FORCE | NH | 1 | Women Pole Vault | 3.28 | — | 3.43 | — |
NH | Kathryn TOMCZAK | SR | AIR FORCE | NH | 1 | Women Pole Vault | 3.28 | — | 3.43 | — |
Pole Vault Long Format
These results do get quite wide, but don’t worry. Switching to longer is easy as with JumpeR::attempts_split_long
.
df <- tf_parse( read_results("https://www.flashresults.com/2019_Meets/Outdoor/04-12_TamuInvite/014-1.pdf"), flights = TRUE, flight_attempts = TRUE ) df %>% attempts_split_long() %>% filter(Place == 1) %>% # only first place athlete select(Place, Name, Age, Team, Finals_Result, Event, Bar_Height, Attempt, Result) %>% flextable_style()
Place | Name | Age | Team | Finals_Result | Event | Bar_Height | Attempt | Result |
1 | Caroline BELLOWS | SR | UTSA | 3.88m | Women Pole Vault | 3.28 | 1 | – |
1 | Caroline BELLOWS | SR | UTSA | 3.88m | Women Pole Vault | 3.28 | 2 | – |
1 | Caroline BELLOWS | SR | UTSA | 3.88m | Women Pole Vault | 3.28 | 3 | – |
1 | Caroline BELLOWS | SR | UTSA | 3.88m | Women Pole Vault | 3.43 | 1 | O |
1 | Caroline BELLOWS | SR | UTSA | 3.88m | Women Pole Vault | 3.58 | 1 | X |
1 | Caroline BELLOWS | SR | UTSA | 3.88m | Women Pole Vault | 3.58 | 2 | O |
1 | Caroline BELLOWS | SR | UTSA | 3.88m | Women Pole Vault | 3.73 | 1 | X |
1 | Caroline BELLOWS | SR | UTSA | 3.88m | Women Pole Vault | 3.73 | 2 | O |
1 | Caroline BELLOWS | SR | UTSA | 3.88m | Women Pole Vault | 3.88 | 1 | X |
1 | Caroline BELLOWS | SR | UTSA | 3.88m | Women Pole Vault | 3.88 | 2 | O |
1 | Caroline BELLOWS | SR | UTSA | 3.88m | Women Pole Vault | 4.03 | 1 | X |
1 | Caroline BELLOWS | SR | UTSA | 3.88m | Women Pole Vault | 4.03 | 2 | X |
1 | Caroline BELLOWS | SR | UTSA | 3.88m | Women Pole Vault | 4.03 | 3 | X |
Relay Athletes
Going back to those Ivy League results, we can pull out the names relay athletes for each relay.
df <- tf_parse( read_results("http://www.leonetiming.com/2020/Indoor/IvyLeague/Results.htm"), relay_athletes = TRUE ) df %>% filter(Event == "Men 4x400 Meter Relay") %>% select(-Tiebreaker, -Name) %>% flextable_style()
Place | Age | Team | Finals_Result | DQ | Event | Relay_Athlete_1 | Relay_Athlete_2 | Relay_Athlete_3 | Relay_Athlete_4 |
1 | Harvard | 3:13.85 | 0 | Men 4×400 Meter Relay | Aaron Shirley | Gregory Lapit | Charles Lego | Jovahn Williamson | |
2 | Penn | 3:15.55 | 0 | Men 4×400 Meter Relay | Robbie Ruppel | Anthony Okolo | Emerson Douds | Antaures Jackson | |
3 | Yale | 3:16.60 | 0 | Men 4×400 Meter Relay | Christopher Colbert | Juma Sei | Phil Zuccaro | Marcus Woods | |
4 | Cornell | 3:17.61 | 0 | Men 4×400 Meter Relay | Christian Martin | Myles Solan | Malick Diomande | Tien Henderson | |
5 | Dartmouth | 3:17.66 | 0 | Men 4×400 Meter Relay | Mathieu Farber | Charlie Wade | Julian Martelly | Max Frye | |
6 | Columbia | 3:19.42 | 0 | Men 4×400 Meter Relay | Chris Balthazar | Jahi Hernandez | Brodie Holmes | Vasilis Kopanas | |
7 | Princeton | 3:20.61 | 0 | Men 4×400 Meter Relay | Gregory Sholars | Klaudio Gjetja | Anderson Dimon | Michael Phillippy | |
8 | Brown | 3:25.72 | 0 | Men 4×400 Meter Relay | Sergey Gorban | Austin Reynolds | Kevin Boyce | Tim McDonough |
Formating Results
Track and field results are of two forms. Times, as “MM:SS.HH”, and lengths/heights, often as “X.XXm”. JumpeR
has math_format
for converting these result strings into numerics, which is useful when doing comparisons and plotting. Here’s the men’s pole vault at the USA T&F 2019 Championships .
df <- tf_parse( read_results("https://www.flashresults.com/2019_Meets/Outdoor/07-25_USATF_CIS/026-1.pdf")) df %>% mutate(Finals_Math = math_format(Finals_Result)) %>% # results to numerics mutate(Name = factor(Name, unique(Name))) %>% # order names by order of finish ggplot(aes(x = Name, y = Finals_Math)) + geom_col() + theme_bw() + theme(axis.text.x = element_text( angle = 90, vjust = 0.5, hjust = 1 )) + labs(y = "Height Cleared (m)", title = "USA Pole Vault Championships")
One can use math_format
on mixed format lists too. Times will be converted to seconds, meters will remain in meters, and standard units (feet, inches) will be converted to inches. Units however are not included, so be aware.
demo_list <- c( "1.23m", # a height/length in meters, output in meters "5-06.45", # a height/length in standard, output in inches "10:34.34", # a time with minutes, output in seconds "9.45" # a time without minutes, output in seconds ) math_format(demo_list) ## [1] 1.23 66.45 634.34 9.45
JumpeR Going Forward
I plan to maintain JumpeR
, fix bugs, and respond to feature requests as I’m able. Another useful improvement would be increasing the number/types of supported results. More contributors are certainly welcome. If you’d like to be involved get in touch, or visit the project repo on github.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.