Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
SwimmeR version 0.7.2 is now available from CRAN. This new version contains some new features, plus a few changes to make it more user-friendly. Let me show you what I’ve been working on.
library(SwimmeR) library(dplyr) library(stringr) library(flextable) library(rbenchmark) flextable_style <- function(x) { x %>% flextable() %>% bold(part = "header") %>% # bold header bg(bg = "#D3D3D3", part = "header") %>% # puts gray background behind the header row align_nottext_col(align = "center", header = TRUE, footer = TRUE) %>% # center alignment autofit() }
New Features
- SwimmeR can now parse S.A.M.M.S. style results. S.A.M.M.S., which stands for Swimclub And Meet Management System, was an ahem swim club and meet management system that predated Hy-Tek’s Meet and Team Manager. It seems to have been most popular in California, where it’s still used by USA Swimming clubs and high schools into the present day.
S.A.M.M.S. meets look like this:
Parsing them is a simple matter for you SwimmeR
users – it’s exactly the same as parsing Hy-Tek style results. The only differences come in with respect to relay_swimmers
and splits
. Same read_results
, same swim_parse
. S.A.M.M.S. results that I’ve seen don’t include relay swimmers, so of course SwimmeR
doesn’t collect them. Splits are also rarely seen in S.A.M.M.S. results and at this moment are also not collected by SwimmeR
, although they may be in a future release.
df <- swim_parse( read_results( "http://www.pacswim.org/userfiles/meets/documents/1629/1119bac.htm" ) ) df %>% head(5) %>% flextable_style()
Place | Name | Age | Team | Finals_Time | DQ | Event |
1 | LADOMIRAK, ALEGRIA | 8 | PC PALO ALTO STANFORD | 16.11 | 0 | EVENT 73 FEMALE 8&UN 25 FREE |
2 | DIEHN, EVA | 8 | PC BULL DOG SWIM CLUB | 16.36 | 0 | EVENT 73 FEMALE 8&UN 25 FREE |
3 | HILL, NAOMI | 8 | PC PALO ALTO STANFORD | 16.50 | 0 | EVENT 73 FEMALE 8&UN 25 FREE |
4 | HOUTZER, AMELIA | 8 | PC PALO ALTO STANFORD | 16.88 | 0 | EVENT 73 FEMALE 8&UN 25 FREE |
5 | CHANG, KAYLA | 8 | PC BURLINGAME AQUATIC | 17.55 | 0 | EVENT 73 FEMALE 8&UN 25 FREE |
On a personal level working with these S.A.M.M.S. results was very encouraging, because they have all kinds of weird bugs and cut corners that make me feel better about SwimmeR
. For example some S.A.M.M.S. results list a place order for finals swims, as “F1”, “F2”, etc. But S.A.M.M.S. can’t handle more than two characters in that field, so if someone comes in 10th they just get “F”.
S.A.M.M.S. also doesn’t know what to make of diving, and records diving results like swimming results, so “347.56” is written as “3:47.56” (swim_parse
corrects this). S.A.M.M.S. also orders diving results backwards with the lowest (i.e. fastest) score/time listed first.
S.A.M.M.S. was a commercial product. SwimmeR
might have its issues sometimes, but at least it’s free!
- Under the hood changes to speed up
swim_parse
. We can test this withbenchmark
from therbenchmark
package because I’ve left the oldswim_parse
function inSwimmeR
, renamedswim_parse_old
. It’s not exported though so to actually access it you’ll need to call it asSwimmeR:::swim_parse_old
.
benchmark("new" = { swim_parse( read_results( "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm" ) ) }, "old" = { SwimmeR:::swim_parse_old( read_results( "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm" ) ) }, replications = 5) %>% flextable_style()
test | replications | elapsed | relative | user.self | sys.self |
new | 5 | 33.62 | 1.000 | 30.67 | 0.06 |
old | 5 | 76.72 | 2.282 | 74.16 | 0.09 |
relative
column above, the new version of swim_parse
is a little over twice as fast as the old version (on my computer at least). You’re all very welcome.
- Kinder and gentler all around. There have been several changes to make
swim_parse
more user friendly. First is decreased reliance on thetypo
andreplacement
arguments. They’re still present, and still work, but they’re hopefully now much less necessary.
By way of example in this meet there’s a young man named “DU Fayet DE LA Tour, Vin”, as seen here:
swim_parse_old
struggles with this, and gets his name wrong.
df_old <- SwimmeR:::swim_parse_old( read_results( "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm" ) ) df_old %>% filter(str_detect(Name, "DU Fayet DE LA Tour") == TRUE) %>% select(-Points,-DQ,-Exhibition) %>% flextable_style()
Place | Name | Age | Team | Prelims_Time | Finals_Time | Event |
13 | DU Fayet DE LA Tour | 14 | NBA-PC | 1:02.09 | 1:00.08 | Boys 13-14 100 Yard Freestyle |
12 | DU Fayet DE LA Tour | 14 | NBA-PC | 1:20.75 | 1:09.03 | Boys 13-14 100 Yard Backstroke |
14 | DU Fayet DE LA Tour | 14 | NBA-PC | 1:16.16 | 1:10.01 | Boys 13-14 100 Yard Butterfly |
16 | DU Fayet DE LA Tour | 14 | NBA-PC | 2:50.00 | 2:35.25 | Boys 13-14 200 Yard IM |
We can fix the problem in a hacky, and non-intuitive kind of way using typo
and replacement
, plus some after the parse changes. It works, but it’s not terribly easy.
df_old_tr <- SwimmeR:::swim_parse_old( read_results( "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm" ), typo = ", Vin ", replacement = " Vin " ) %>% mutate(Name = str_replace(Name, " Vin", ", Vin")) df_old_tr %>% filter(str_detect(Name, "DU Fayet DE LA Tour") == TRUE) %>% select(-Points, -DQ, -Exhibition) %>% flextable_style()
Place | Name | Age | Team | Prelims_Time | Finals_Time | Event |
13 | DU Fayet DE LA Tour, Vin | 14 | NBA-PC | 1:02.09 | 1:00.08 | Boys 13-14 100 Yard Freestyle |
12 | DU Fayet DE LA Tour, Vin | 14 | NBA-PC | 1:20.75 | 1:09.03 | Boys 13-14 100 Yard Backstroke |
14 | DU Fayet DE LA Tour, Vin | 14 | NBA-PC | 1:16.16 | 1:10.01 | Boys 13-14 100 Yard Butterfly |
16 | DU Fayet DE LA Tour, Vin | 14 | NBA-PC | 2:50.00 | 2:35.25 | Boys 13-14 200 Yard IM |
Compare that to the much simpler approach available in swimmeR version 0.7.2. – no need for typo
& replacement
, and no need to after-parse fixes to Vin’s name.
df_new <- swim_parse( read_results( "http://www.pacswim.org/userfiles/meets/documents/1547/nvst-results.htm" ) ) df_new %>% filter(str_detect(Name, "DU Fayet DE LA Tour") == TRUE) %>% select(-Points,-DQ,-Exhibition) %>% flextable_style()
Place | Name | Age | Team | Prelims_Time | Finals_Time | Event |
13 | DU Fayet DE LA Tour, Vin | 14 | NBA-PC | 1:02.09 | 1:00.08 | Boys 13-14 100 Yard Freestyle |
12 | DU Fayet DE LA Tour, Vin | 14 | NBA-PC | 1:20.75 | 1:09.03 | Boys 13-14 100 Yard Backstroke |
14 | DU Fayet DE LA Tour, Vin | 14 | NBA-PC | 1:16.16 | 1:10.01 | Boys 13-14 100 Yard Butterfly |
16 | DU Fayet DE LA Tour, Vin | 14 | NBA-PC | 2:50.00 | 2:35.25 | Boys 13-14 200 Yard IM |
typo
and replacement
. Sometimes there really are typos that need replacing. Things should be easier now though.
Second – event names were also an issue in older versions of SwimmeR
. If swim_parse
didn’t find any event names it liked it would throw an error and return nothing. Now, in swimmeR
version 0.7.2 the event name definitions are much broader, and failing to find any event names will not result in an error.
These results, from the 2019 Australian Nationals won’t read in previous version of SwimmeR
because the events are named with “Metre” rather than “Meter”. Now though, with SwimmeR
version 0.7.2 we can see the Campbell sisters doing their thing.
df_aus <- swim_parse( read_results( "https://www.swimming.org.au/sites/default/files/assets/documents/full%20results_0.pdf" ) ) df_aus %>% head(2) %>% flextable_style()
Place | Name | Age | Team | Prelims_Time | Finals_Time | Points | DQ | Exhibition | Event |
1 | CAMPBELL, CATE | 27 | KNOX PYMBLE | 24.33 | 24.05 | 953 | 0 | 0 | Women 50 LC Metre Freestyle |
2 | CAMPBELL, BRONTE | 25 | KNOX PYMBLE | 24.60 | 24.17 | 939 | 0 | 0 | Women 50 LC Metre Freestyle |
- Modifications to
swim_parse
to begin to handle older style Hy-Tek results, like these from 2002. Issues with inconstant treatment of splits within the results themselves remain, so let the user beware. These older results are still an active area of development.
df_2002 <- swim_parse( read_results( "https://cdn.swimswam.com/wp-content/uploads/2018/08/2002-Division-I-NCAA-Championships-Men-results1.pdf" ) ) df_2002 %>% filter(str_detect(Event, "100 Yard BUTTERFLY")) %>% head(3) %>% flextable_style()
Place | Name | Age | Team | Prelims_Time | Finals_Time | Points | DQ | Exhibition | Event |
1 | CROCKER, IAN | SO | TEXAS | 45.70 | 45.44 | NA | 0 | 0 | Event 9 MEN’s 100 Yard BUTTERFLY |
2 | MARSHALL, PETER | SO | STANFORD | 46.39 | 46.48 | NA | 0 | 0 | Event 9 MEN’s 100 Yard BUTTERFLY |
3 | SCHOEMAN, ROLAND | SR | ARIZONA | 46.57 | 46.50 | NA | 0 | 0 | Event 9 MEN’s 100 Yard BUTTERFLY |
- Bug fixes, always bug fixes.
In Closing
Please do download the newest version of SwimmeR
from wherever you get your packages. You’re also welcome to submit bug reports or feature requests on the SwimmeR
project github page.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.