Site icon R-bloggers

New Version of SwimmeR and the Next Round of the State-Off Tournament

[This article was first published on Swimming + Data Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
  • The first round of the 2020 High School Swimming State-Off Tournament is in the books and saw California (1), Texas (2), Florida, and Pennsylvania (5) advance.

    Before beginning the next round there are a few administrative details I’d like to cover.

    1. First and foremost: SwimmeR version 0.4.1 is now available on CRAN! The State-Off has been the first major outing for my SwimmeR package. We’ve used it extensively to read in and parse swimming results from a variety of sources, including “normal” html web pages, Hy-Tek real time results pages, and .pdf files. It’s performed admirably, but some bugs have revealed themselves behind the scenes. Version 0.4.1 contains bug fixes plus a host of new features:
    • A version of results_score, the function we developed during the State-Off. It handles timed finals style meets (like the State-Off) but also scores prelims-finals style meets, a more common and also more complex format.
    library(stringr)
    library(dplyr)
    library(purrr)
    library(SwimmeR)
    library(flextable)
    
    base <- "http://sidearmstats.com/auburn/swim/200218F0"
    event_numbers <-
      1:42 # sequence of numbers, total of 42 events across men and women
    event_numbers <-
      str_pad(event_numbers,
              width = 2,
              side = "left",
              pad = "0") # add leading zeros to single digit numbers
    SEC_Links <-
      paste0(base, event_numbers, ".htm") # paste together base urls and sequence of numbers (with leading zeroes as needed)
    
    SEC_Results <-
      map(SEC_Links, read_results, node = "pre") %>% # map SwimmeR::read_results over the list of links
      map(
        swim_parse,
        typo = c(
          "A&M",
          "FLOR",
          "Celaya-Hernande",
          # names which were cut off, and missing the last, first structure
          "Hernandez-Tome",
          "Garcia Varela,",
          "Von Biberstein,"
        ),
        replacement = c(
          "AM",
          "Florida",
          "Celaya, Hernande",
          # replacement names that artificially impose last, first structure.  Names can be fixed after parsing
          "Hernandez, Tome",
          "Garcia, Varela",
          "Von, Biberstein"
        )
      ) %>%
      bind_rows()
    
    
    # some diving finals results don't list places 9-24, which do score.  we can get those divers from the prelim results
    SEC_Diving_Prelims_Links <-
      c(
        "http://sidearmstats.com/auburn/swim/200218P015.htm",
        # M 1m prelims
        "http://sidearmstats.com/auburn/swim/200218P001.htm",
        # W 1m prelims
        "http://sidearmstats.com/auburn/swim/200218P022.htm",
        # W 3m prelims
        "http://sidearmstats.com/auburn/swim/200218P029.htm",
        # M platform prelims
        "http://sidearmstats.com/auburn/swim/200218P040.htm"
      ) # W platform prelims
    
    SEC_Diving_Prelims <-
      map(SEC_Diving_Prelims_Links, read_results, node = "pre") %>% # map SwimmeR::read_results over the list of links
      map(
        swim_parse,
        typo = c("A&M", "FLOR", "Celaya-Hernande", "Garcia Varela,"),
        replacement = c("AM", "Florida", "Celaya, Hernande", "Garcia, Varela")
      ) %>%
      bind_rows()
    
    SEC_Diving_Prelims <- SEC_Diving_Prelims %>%
      anti_join(SEC_Results, by = c("Name", "School", "Event")) # make sure divers aren't counted twice for a given event
    
    SEC_Results <- bind_rows(SEC_Results, SEC_Diving_Prelims)
    
    SEC_Results <-
      SEC_Results %>% # actual use of new results_score function
      results_score(
        events = unique(SEC_Results$Event),
        meet_type = "prelims_finals",
        lanes = 8,
        scoring_heats = 3,
        point_values = c(
          32,
          28,
          27,
          26,
          25,
          24,
          23,
          22,
          20,
          17,
          16,
          15,
          14,
          13,
          12,
          11,
          9,
          7,
          6,
          5,
          4,
          3,
          2,
          1
        )
      )
    
    SEC_Results_Gender <- SEC_Results %>%
      mutate(Gender = case_when(str_detect(Event, "Men") ~ "M",
                                str_detect(Event, "Women") ~ "F")) %>%
      group_by(School, Gender) %>%
      summarise(Score = sum(Points, na.rm = TRUE)) %>%
      arrange(desc(Score)) %>%
      arrange(Gender) %>%
      ungroup() %>%
      group_split(Gender)


    The scored results match the official results for women:

    SEC_Results_Gender[[1]] %>%
      flextable() %>%
      bold(part = "header") %>%
      bg(bg = "#D3D3D3", part = "header") %>%
      autofit()

    School

    Gender

    Score

    Tennessee

    F

    1108.0

    Florida

    F

    1079.5

    Kentucky

    F

    987.5

    Georgia

    F

    986.0

    Auburn

    F

    866.0

    Texas AM

    F

    851.0

    Alabama

    F

    748.0

    Missouri

    F

    500.0

    South Carolina

    F

    427.0

    Arkansas

    F

    422.0

    LSU

    F

    417.0

    Vanderbilt

    F

    150.0



    Scores also match for men:

    SEC_Results_Gender[[2]] %>%
      flextable() %>%
      bold(part = "header") %>%
      bg(bg = "#D3D3D3", part = "header") %>%
      autofit()

    School

    Gender

    Score

    Florida

    M

    1194.0

    Texas AM

    M

    975.5

    Georgia

    M

    953.5

    Alabama

    M

    935.5

    Missouri

    M

    846.5

    Tennessee

    M

    817.0

    Kentucky

    M

    724.0

    Auburn

    M

    697.0

    LSU

    M

    517.0

    South Carolina

    M

    504.0


    • The ability to read in .hy3 files. Hy-Tek .hy3 files are another form of results, intended to be read into Team Manager. As of version 0.4.1 SwimmeR can now also read them. This feature is not complete and will evolve in future releases. Bug reports are welcome at the SwimmeR github page. Here though we can use it to read in results from the USA Swimming 2019 December Sectional Meet for CA and NV.
    temp <- tempfile()
    temp2 <- tempfile()
    url <-
      "http://www.pacswim.org/userfiles/meets/documents/1691/meet-results-speedo-sectionals-2019-ca-nv-december-2019-13dec2019-003.zip"
    
    download.file(url, temp)
    unzip(zipfile = temp, exdir = temp2)
    raw_results <-
      read_results(
        file.path(
          temp2,
          "Meet Results-Speedo Sectionals 2019 CA-NV December 2019-13Dec2019-003.hy3"
        )
      )
    unlink(c(temp, temp2))
    
    results <- swim_parse(raw_results) %>%
      mutate(Event = str_replace(Event, "NA", "Yard"))
    
    results %>%
      filter(Event == "100 Yard Butterfly",
             Gender == "M") %>%
      select(Name, Team = School, Prelims_Time, Finals_Time) %>%
      arrange(Finals_Time) %>%
      head(5) %>%
      flextable() %>%
      bold(part = "header") %>%
      bg(bg = "#D3D3D3", part = "header") %>%
      autofit()

    Name

    Team

    Prelims_Time

    Finals_Time

    Fischer, Brandon

    C1LAC

    49.20

    48.07

    Antoniuk, Konrad

    Paseo Aquatics Swim Team

    50.48

    50.03

    Toland, Brandon

    Golden West Swim Club

    50.30

    50.06

    Kim, William

    Monterey Park Manta Rays

    50.93

    50.16

    Bowman, Andrew

    San Clemente Aquatics

    50.95

    50.30


    • Recording of DQ and Exhibition swims in the output of swim_parse, as the columns DQ and Exhibition respectively. This ended up being important for results_score, since Exhibition and DQ swimmers can’t score.


    Ithaca_Union <-
      swim_parse(
        read_results(
          "https://athletics.ithaca.edu/services/download_file.ashx?file_location=https://s3.amazonaws.com/sidearm.sites/bombers.ithaca.edu/documents/2020/2/1/ithaca_vs_union_2020.pdf"
        )
      )
    
    Ithaca_Union %>%
      filter(Event == "Men 400 Yard Freestyle Relay") %>%
      select(Place, School, Finals_Time, Exhibition, DQ) %>%
      flextable() %>%
      bold(part = "header") %>%
      bg(bg = "#D3D3D3", part = "header") %>%
      autofit()

    Place

    School

    Finals_Time

    Exhibition

    DQ

    1

    Ithaca College-NI

    3:21.86

    0

    0

    2

    Ithaca College-NI

    3:26.28

    0

    0

    3

    Ithaca College-NI

    3:34.10

    1

    0

    NA

    Union College (New York)-MR

    0

    1

    We can see that in the Mens 400 Yard Freestyle Relay the third place relay was exhibition (Exhibition == 1) and that another relay was disqualified (DQ == 1).


    • Bug fixes include fixing an issue where tied athletes, with “*” in front of their places would not be imported, an issue where times or scores with a “J” in front of them (a Hy-Tek marker meaning a time/score was judged) would not be imported.
    1. Since we’ve already read in results for each state I’m not going to re-read them in each State-Off post going forward. Instead I’m hosting the results on github and will just pull them from there. Don’t worry, there will still be plenty of work for SwimmeR to do.

    2. Continuing from point 2, the focus of the first round was mostly on demonstrating how to read in swimming data with Swimmer. This next round will focus more on exactly what that data is and how to use it.

    Thanks for joining us, and don’t forget to update your version of SwimmeR in preparation of the next round of the High School Swimming State-Off Tournament!

    To leave a comment for the author, please follow the link and comment on their blog: Swimming + Data Science.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.