Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The first round of the 2020 High School Swimming State-Off Tournament is in the books and saw California (1), Texas (2), Florida, and Pennsylvania (5) advance.
Before beginning the next round there are a few administrative details I’d like to cover.
- First and foremost:
SwimmeR
version 0.4.1 is now available on CRAN! The State-Off has been the first major outing for mySwimmeR
package. We’ve used it extensively to read in and parse swimming results from a variety of sources, including “normal” html web pages, Hy-Tek real time results pages, and .pdf files. It’s performed admirably, but some bugs have revealed themselves behind the scenes. Version 0.4.1 contains bug fixes plus a host of new features:
- A version of
results_score
, the function we developed during the State-Off. It handles timed finals style meets (like the State-Off) but also scores prelims-finals style meets, a more common and also more complex format.
library(stringr) library(dplyr) library(purrr) library(SwimmeR) library(flextable) base <- "http://sidearmstats.com/auburn/swim/200218F0" event_numbers <- 1:42 # sequence of numbers, total of 42 events across men and women event_numbers <- str_pad(event_numbers, width = 2, side = "left", pad = "0") # add leading zeros to single digit numbers SEC_Links <- paste0(base, event_numbers, ".htm") # paste together base urls and sequence of numbers (with leading zeroes as needed) SEC_Results <- map(SEC_Links, read_results, node = "pre") %>% # map SwimmeR::read_results over the list of links map( swim_parse, typo = c( "A&M", "FLOR", "Celaya-Hernande", # names which were cut off, and missing the last, first structure "Hernandez-Tome", "Garcia Varela,", "Von Biberstein," ), replacement = c( "AM", "Florida", "Celaya, Hernande", # replacement names that artificially impose last, first structure. Names can be fixed after parsing "Hernandez, Tome", "Garcia, Varela", "Von, Biberstein" ) ) %>% bind_rows() # some diving finals results don't list places 9-24, which do score. we can get those divers from the prelim results SEC_Diving_Prelims_Links <- c( "http://sidearmstats.com/auburn/swim/200218P015.htm", # M 1m prelims "http://sidearmstats.com/auburn/swim/200218P001.htm", # W 1m prelims "http://sidearmstats.com/auburn/swim/200218P022.htm", # W 3m prelims "http://sidearmstats.com/auburn/swim/200218P029.htm", # M platform prelims "http://sidearmstats.com/auburn/swim/200218P040.htm" ) # W platform prelims SEC_Diving_Prelims <- map(SEC_Diving_Prelims_Links, read_results, node = "pre") %>% # map SwimmeR::read_results over the list of links map( swim_parse, typo = c("A&M", "FLOR", "Celaya-Hernande", "Garcia Varela,"), replacement = c("AM", "Florida", "Celaya, Hernande", "Garcia, Varela") ) %>% bind_rows() SEC_Diving_Prelims <- SEC_Diving_Prelims %>% anti_join(SEC_Results, by = c("Name", "School", "Event")) # make sure divers aren't counted twice for a given event SEC_Results <- bind_rows(SEC_Results, SEC_Diving_Prelims) SEC_Results <- SEC_Results %>% # actual use of new results_score function results_score( events = unique(SEC_Results$Event), meet_type = "prelims_finals", lanes = 8, scoring_heats = 3, point_values = c( 32, 28, 27, 26, 25, 24, 23, 22, 20, 17, 16, 15, 14, 13, 12, 11, 9, 7, 6, 5, 4, 3, 2, 1 ) ) SEC_Results_Gender <- SEC_Results %>% mutate(Gender = case_when(str_detect(Event, "Men") ~ "M", str_detect(Event, "Women") ~ "F")) %>% group_by(School, Gender) %>% summarise(Score = sum(Points, na.rm = TRUE)) %>% arrange(desc(Score)) %>% arrange(Gender) %>% ungroup() %>% group_split(Gender)
The scored results match the official results for women:
SEC_Results_Gender[[1]] %>% flextable() %>% bold(part = "header") %>% bg(bg = "#D3D3D3", part = "header") %>% autofit()
School | Gender | Score |
Tennessee | F | 1108.0 |
Florida | F | 1079.5 |
Kentucky | F | 987.5 |
Georgia | F | 986.0 |
Auburn | F | 866.0 |
Texas AM | F | 851.0 |
Alabama | F | 748.0 |
Missouri | F | 500.0 |
South Carolina | F | 427.0 |
Arkansas | F | 422.0 |
LSU | F | 417.0 |
Vanderbilt | F | 150.0 |
Scores also match for men:
SEC_Results_Gender[[2]] %>% flextable() %>% bold(part = "header") %>% bg(bg = "#D3D3D3", part = "header") %>% autofit()
School | Gender | Score |
Florida | M | 1194.0 |
Texas AM | M | 975.5 |
Georgia | M | 953.5 |
Alabama | M | 935.5 |
Missouri | M | 846.5 |
Tennessee | M | 817.0 |
Kentucky | M | 724.0 |
Auburn | M | 697.0 |
LSU | M | 517.0 |
South Carolina | M | 504.0 |
- The ability to read in .hy3 files. Hy-Tek .hy3 files are another form of results, intended to be read into Team Manager. As of version 0.4.1
SwimmeR
can now also read them. This feature is not complete and will evolve in future releases. Bug reports are welcome at the SwimmeR github page. Here though we can use it to read in results from the USA Swimming 2019 December Sectional Meet for CA and NV.
temp <- tempfile() temp2 <- tempfile() url <- "http://www.pacswim.org/userfiles/meets/documents/1691/meet-results-speedo-sectionals-2019-ca-nv-december-2019-13dec2019-003.zip" download.file(url, temp) unzip(zipfile = temp, exdir = temp2) raw_results <- read_results( file.path( temp2, "Meet Results-Speedo Sectionals 2019 CA-NV December 2019-13Dec2019-003.hy3" ) ) unlink(c(temp, temp2)) results <- swim_parse(raw_results) %>% mutate(Event = str_replace(Event, "NA", "Yard")) results %>% filter(Event == "100 Yard Butterfly", Gender == "M") %>% select(Name, Team = School, Prelims_Time, Finals_Time) %>% arrange(Finals_Time) %>% head(5) %>% flextable() %>% bold(part = "header") %>% bg(bg = "#D3D3D3", part = "header") %>% autofit()
Name | Team | Prelims_Time | Finals_Time |
Fischer, Brandon | C1LAC | 49.20 | 48.07 |
Antoniuk, Konrad | Paseo Aquatics Swim Team | 50.48 | 50.03 |
Toland, Brandon | Golden West Swim Club | 50.30 | 50.06 |
Kim, William | Monterey Park Manta Rays | 50.93 | 50.16 |
Bowman, Andrew | San Clemente Aquatics | 50.95 | 50.30 |
- Recording of DQ and Exhibition swims in the output of
swim_parse
, as the columnsDQ
andExhibition
respectively. This ended up being important forresults_score
, since Exhibition and DQ swimmers can’t score.
Ithaca_Union <- swim_parse( read_results( "https://athletics.ithaca.edu/services/download_file.ashx?file_location=https://s3.amazonaws.com/sidearm.sites/bombers.ithaca.edu/documents/2020/2/1/ithaca_vs_union_2020.pdf" ) ) Ithaca_Union %>% filter(Event == "Men 400 Yard Freestyle Relay") %>% select(Place, School, Finals_Time, Exhibition, DQ) %>% flextable() %>% bold(part = "header") %>% bg(bg = "#D3D3D3", part = "header") %>% autofit()
Place | School | Finals_Time | Exhibition | DQ |
1 | Ithaca College-NI | 3:21.86 | 0 | 0 |
2 | Ithaca College-NI | 3:26.28 | 0 | 0 |
3 | Ithaca College-NI | 3:34.10 | 1 | 0 |
NA | Union College (New York)-MR | 0 | 1 |
We can see that in the Mens 400 Yard Freestyle Relay the third place relay was exhibition (Exhibition == 1
) and that another relay was disqualified (DQ == 1
).
- Bug fixes include fixing an issue where tied athletes, with “*” in front of their places would not be imported, an issue where times or scores with a “J” in front of them (a Hy-Tek marker meaning a time/score was judged) would not be imported.
Since we’ve already read in results for each state I’m not going to re-read them in each State-Off post going forward. Instead I’m hosting the results on github and will just pull them from there. Don’t worry, there will still be plenty of work for
SwimmeR
to do.Continuing from point 2, the focus of the first round was mostly on demonstrating how to read in swimming data with
Swimmer
. This next round will focus more on exactly what that data is and how to use it.
Thanks for joining us, and don’t forget to update your version of SwimmeR
in preparation of the next round of the High School Swimming State-Off Tournament!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.