Site icon R-bloggers

Introducing JumpeR – For Track and Field Data

[This article was first published on Welcome to Swimming + Data Science on Swimming + Data Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
  • Ordinarily posts on Swimming + Data Science have focused on swimming, or sometimes diving. Today though we’re going to visit some of our more gravity-afflicted colleagues and do a bit of cross-training. That’s because following what I’m going to call the SwimmeR package’s massive success literally several people reached out to me regarding developing a similar package for track and field. That package, called JumpeR, is now available on CRAN.

    You can get your very own copy of this cutting edge sports-data-science package, for free, today!

    install.packages("JumpeR")
    library(JumpeR)
    library(flextable)
    library(dplyr)
    library(ggplot2)
    
    flextable_style <- function(x) {
      x %>%
        flextable() %>%
        bold(part = "header") %>% # bolds header
        bg(bg = "#D3D3D3", part = "header") %>%  # puts gray background behind the header row
        autofit()
    }

    What does JumpeR do?

    JumpeR is very similar to SwimmeR. They both mostly serve to convert results from human readable documents to machine & human readable data frames in the context of the R programming environment.

    Supported Results Format

    JumpeR currently supports single column Hy-Tek results, like these, and Flash Results .pdf files like these. JumpeR does not support multi-column Hy-Tek results or Flash .html files. Further details are available in the package readme file.

    Examples

    A Running Race

    Here’s an example, reading in the 2019 Ivy League Championships and looking at the finals of the Women’s 200M Dash

    df <- tf_parse(
      read_results("http://www.leonetiming.com/2020/Indoor/IvyLeague/Results.htm")
      )
    
    df %>% 
      filter(Event == "Women 200 Meter Dash") %>% 
      group_by(Name, Team) %>% # to remove prelims
      slice(2) %>% # to remove prelims
      arrange(Place) %>% # arrange by Place
      flextable_style()

    Place

    Name

    Age

    Team

    Finals_Result

    Tiebreaker

    DQ

    Event

    1

    Katina Martin

    SO

    Harvard

    24.05

    0

    Women 200 Meter Dash

    2

    Olivia Okoli

    JR

    Harvard

    24.44

    0

    Women 200 Meter Dash

    3

    Cecil Ene

    SR

    Penn

    24.52

    24.511

    0

    Women 200 Meter Dash

    4

    Elena Brown-Soler

    SR

    Penn

    24.52

    24.520

    0

    Women 200 Meter Dash

    5

    Katie DiFrancesco

    JR

    Princeton

    24.53

    0

    Women 200 Meter Dash

    6

    Libby McMahon

    SO

    Yale

    25.12

    0

    Women 200 Meter Dash

    7

    Isabella Hilditch

    SO

    Princeton

    40.06

    0

    Women 200 Meter Dash

    Kennedy Waite

    FR

    Brown

    DNF

    1

    Women 200 Meter Dash

    Discus, with Flights

    But wait, there’s more! Field events, like jumping and throwing, allow athletes to try several times, with each try called a “flight”. Flights can be captured as well. Here’s the Men’s Discus from the 2019 Virginia Grand Prix

    df <- tf_parse(
      read_results("https://www.flashresults.com/2019_Meets/Outdoor/04-27_VirginiaGrandPrix/038-1.pdf"),
      flights = TRUE
      )
    
    df %>% 
      flextable_style()

    Place

    Name

    Age

    Team

    Finals_Result

    DQ

    Event

    Flight_1

    Flight_2

    Flight_3

    Flight_4

    Flight_5

    Flight_6

    1

    Nicholas EDWARDS

    FR

    HAMPTON

    49.86m

    0

    Men Discus

    X

    47.11

    45.99

    47.28

    X

    49.86

    2

    Michael ALBERT

    JR

    APP STATE

    48.30m

    0

    Men Discus

    48.30

    47.16

    44.96

    X

    45.85

    X

    3

    Joshua HUNTER

    SO

    HAMPTON

    47.43m

    0

    Men Discus

    31.94

    X

    46.54

    X

    47.43

    X

    4

    Peter KENN

    SR

    APP STATE

    46.14m

    0

    Men Discus

    X

    42.83

    46.14

    44.26

    43.80

    44.66

    5

    Asher PRINCE

    FR

    CHARLOTTE

    45.98m

    0

    Men Discus

    X

    45.98

    44.62

    X

    X

    X

    6

    Sasha DAJIA

    SR

    CHARLOTTE

    44.40m

    0

    Men Discus

    X

    44.40

    44.19

    X

    44.08

    42.04

    7

    Britton MANN

    SR

    HIGH POINT

    42.07m

    0

    Men Discus

    X

    38.31

    X

    40.49

    X

    42.07

    8

    Gabriel STAINBACK

    SO

    HIGH POINT

    39.37m

    0

    Men Discus

    38.53

    36.94

    39.37

    FOUL

    Kysheen MYRICK

    SO

    LIBERTY

    FOUL

    1

    Men Discus

    X

    X

    X

    FOUL

    Tyson JONES

    FR

    VIRGINIA TECH

    FOUL

    1

    Men Discus

    X

    X

    X

    Pole Vault, with Flights and Attempts

    JumpeR can even capture attempts for vertical jumping events, like in these Women’s Pole Vault results from the 2019 Texas A&M Invite. These results do get quite wide, so here they’re cut off at Flight 2.

    df <- tf_parse(
      read_results("https://www.flashresults.com/2019_Meets/Outdoor/04-12_TamuInvite/014-1.pdf"),
      flights = TRUE,
      flight_attempts = TRUE
      )
    
    df %>% 
      select(Place:Flight_2_Attempts) %>% 
      flextable_style()

    Place

    Name

    Age

    Team

    Finals_Result

    DQ

    Event

    Flight_1

    Flight_1_Attempts

    Flight_2

    Flight_2_Attempts

    1

    Caroline BELLOWS

    SR

    UTSA

    3.88m

    0

    Women Pole Vault

    3.28

    3.43

    O

    2

    Myka STEINBEISSER

    FR

    ARIZONA STATE

    3.73m

    0

    Women Pole Vault

    3.28

    3.43

    O

    3

    Tommi HINTNAUS

    SO

    ARIZONA STATE

    3.73m

    0

    Women Pole Vault

    3.28

    3.43

    4

    Erika WILLIS

    FR

    AIR FORCE

    3.58m

    0

    Women Pole Vault

    3.28

    3.43

    O

    5

    Kylie SWIEKATOWSKI

    JR

    RICE

    3.58m

    0

    Women Pole Vault

    3.28

    3.43

    XO

    6

    Cameron BOEDEKER

    JR

    SAM HOUSTON ST.

    3.58m

    0

    Women Pole Vault

    3.28

    3.43

    O

    6

    Kendahl SHUE

    JR

    TCU

    3.58m

    0

    Women Pole Vault

    3.28

    3.43

    8

    Corey FRIEDENBACH

    FR

    AIR FORCE

    3.58m

    0

    Women Pole Vault

    3.28

    3.43

    O

    9

    Tysen TOWNSEND

    FR

    TCU

    3.58m

    0

    Women Pole Vault

    3.28

    3.43

    XXO

    10

    Lauren LABAY

    JR

    SAM HOUSTON ST.

    3.43m

    0

    Women Pole Vault

    3.28

    3.43

    O

    10

    Margaret LASSALLE

    SR

    SAM HOUSTON ST.

    3.43m

    0

    Women Pole Vault

    3.28

    3.43

    O

    12

    Emily HARRISON

    FR

    RICE

    3.43m

    0

    Women Pole Vault

    3.28

    3.43

    XXO

    12

    Frankie PORAMBO

    FR

    AIR FORCE

    3.43m

    0

    Women Pole Vault

    3.28

    O

    3.43

    XXO

    DNS

    Alexandria GRAY

    FR

    UTSA

    DNS

    0

    Women Pole Vault

    3.28

    3.43

    NH

    Hannah SEARBY

    SO

    TEXAS A&M

    NH

    1

    Women Pole Vault

    3.28

    3.43

    XXX

    NH

    Jerni SELF

    SR

    AIR FORCE

    NH

    1

    Women Pole Vault

    3.28

    3.43

    NH

    Kathryn TOMCZAK

    SR

    AIR FORCE

    NH

    1

    Women Pole Vault

    3.28

    3.43

    Pole Vault Long Format

    These results do get quite wide, but don’t worry. Switching to longer is easy as with JumpeR::attempts_split_long.

    df <- tf_parse(
      read_results("https://www.flashresults.com/2019_Meets/Outdoor/04-12_TamuInvite/014-1.pdf"),
      flights = TRUE,
      flight_attempts = TRUE
      )
    
    
    df %>% 
      attempts_split_long() %>% 
      filter(Place == 1) %>% # only first place athlete
      select(Place, Name, Age, Team, Finals_Result, Event, Bar_Height, Attempt, Result) %>% 
      flextable_style()

    Place

    Name

    Age

    Team

    Finals_Result

    Event

    Bar_Height

    Attempt

    Result

    1

    Caroline BELLOWS

    SR

    UTSA

    3.88m

    Women Pole Vault

    3.28

    1

    1

    Caroline BELLOWS

    SR

    UTSA

    3.88m

    Women Pole Vault

    3.28

    2

    1

    Caroline BELLOWS

    SR

    UTSA

    3.88m

    Women Pole Vault

    3.28

    3

    1

    Caroline BELLOWS

    SR

    UTSA

    3.88m

    Women Pole Vault

    3.43

    1

    O

    1

    Caroline BELLOWS

    SR

    UTSA

    3.88m

    Women Pole Vault

    3.58

    1

    X

    1

    Caroline BELLOWS

    SR

    UTSA

    3.88m

    Women Pole Vault

    3.58

    2

    O

    1

    Caroline BELLOWS

    SR

    UTSA

    3.88m

    Women Pole Vault

    3.73

    1

    X

    1

    Caroline BELLOWS

    SR

    UTSA

    3.88m

    Women Pole Vault

    3.73

    2

    O

    1

    Caroline BELLOWS

    SR

    UTSA

    3.88m

    Women Pole Vault

    3.88

    1

    X

    1

    Caroline BELLOWS

    SR

    UTSA

    3.88m

    Women Pole Vault

    3.88

    2

    O

    1

    Caroline BELLOWS

    SR

    UTSA

    3.88m

    Women Pole Vault

    4.03

    1

    X

    1

    Caroline BELLOWS

    SR

    UTSA

    3.88m

    Women Pole Vault

    4.03

    2

    X

    1

    Caroline BELLOWS

    SR

    UTSA

    3.88m

    Women Pole Vault

    4.03

    3

    X

    Relay Athletes

    Going back to those Ivy League results, we can pull out the names relay athletes for each relay.

    df <- tf_parse(
      read_results("http://www.leonetiming.com/2020/Indoor/IvyLeague/Results.htm"),
      relay_athletes = TRUE
      )
    
    df %>% 
      filter(Event == "Men 4x400 Meter Relay") %>% 
      select(-Tiebreaker, -Name) %>% 
      flextable_style()

    Place

    Age

    Team

    Finals_Result

    DQ

    Event

    Relay_Athlete_1

    Relay_Athlete_2

    Relay_Athlete_3

    Relay_Athlete_4

    1

    Harvard

    3:13.85

    0

    Men 4×400 Meter Relay

    Aaron Shirley

    Gregory Lapit

    Charles Lego

    Jovahn Williamson

    2

    Penn

    3:15.55

    0

    Men 4×400 Meter Relay

    Robbie Ruppel

    Anthony Okolo

    Emerson Douds

    Antaures Jackson

    3

    Yale

    3:16.60

    0

    Men 4×400 Meter Relay

    Christopher Colbert

    Juma Sei

    Phil Zuccaro

    Marcus Woods

    4

    Cornell

    3:17.61

    0

    Men 4×400 Meter Relay

    Christian Martin

    Myles Solan

    Malick Diomande

    Tien Henderson

    5

    Dartmouth

    3:17.66

    0

    Men 4×400 Meter Relay

    Mathieu Farber

    Charlie Wade

    Julian Martelly

    Max Frye

    6

    Columbia

    3:19.42

    0

    Men 4×400 Meter Relay

    Chris Balthazar

    Jahi Hernandez

    Brodie Holmes

    Vasilis Kopanas

    7

    Princeton

    3:20.61

    0

    Men 4×400 Meter Relay

    Gregory Sholars

    Klaudio Gjetja

    Anderson Dimon

    Michael Phillippy

    8

    Brown

    3:25.72

    0

    Men 4×400 Meter Relay

    Sergey Gorban

    Austin Reynolds

    Kevin Boyce

    Tim McDonough

    Formating Results

    Track and field results are of two forms. Times, as “MM:SS.HH”, and lengths/heights, often as “X.XXm”. JumpeR has math_format for converting these result strings into numerics, which is useful when doing comparisons and plotting. Here’s the men’s pole vault at the USA T&F 2019 Championships .

    df <- tf_parse(
      read_results("https://www.flashresults.com/2019_Meets/Outdoor/07-25_USATF_CIS/026-1.pdf"))
    
    
    df %>%
      mutate(Finals_Math = math_format(Finals_Result)) %>% # results to numerics
      mutate(Name = factor(Name, unique(Name))) %>% # order names by order of finish
      ggplot(aes(x = Name, y = Finals_Math)) +
      geom_col() +
      theme_bw() +
      theme(axis.text.x = element_text(
        angle = 90,
        vjust = 0.5,
        hjust = 1
      )) +
      labs(y = "Height Cleared (m)",
           title = "USA Pole Vault Championships")

    One can use math_format on mixed format lists too. Times will be converted to seconds, meters will remain in meters, and standard units (feet, inches) will be converted to inches. Units however are not included, so be aware.

    demo_list <- c(
      "1.23m", # a height/length in meters, output in meters
      "5-06.45", # a height/length in standard, output in inches
      "10:34.34", # a time with minutes, output in seconds
      "9.45" # a time without minutes, output in seconds
    )
    
    math_format(demo_list)
    ## [1]   1.23  66.45 634.34   9.45

    JumpeR Going Forward

    I plan to maintain JumpeR, fix bugs, and respond to feature requests as I’m able. Another useful improvement would be increasing the number/types of supported results. More contributors are certainly welcome. If you’d like to be involved get in touch, or visit the project repo on github.

    To leave a comment for the author, please follow the link and comment on their blog: Welcome to Swimming + Data Science on Swimming + Data Science.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.