Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
@racently is a side project that I have been nursing along for a couple of years. It addresses a problem that I have as a runner: my race results are distributed across a variety of web sites. This makes it difficult to create a single view on my running performance (or lack thereof) over time. I suspect that I am not alone in this. Anyway, @racently was built to scratch my personal itch: my running results are now all aggregated in one place.
A few months ago @DanielCunnama suggested that I add the ability to creating running groups in @racently. This sounded like a good idea. It also sounded like a bit of work and TBH I just did not have the time. So I made a counter-suggestion: how about an API so that he could effectively aggregate the data in any way he wanted? He seemed happy with the idea, so it immediately went onto my backlog. And there it stayed. But @DanielCunnama is a persistent guy (perhaps this is why he’s a class runner!) and he pinged me relentlessly about this… until Sunday when I relented and created the API.
And now I’m happy that I did, because it gives me an opportunity to write up a quick post about how these data can be accessed from R.
Profiles on @racently
I’m going to use Gerda Steyn as an example. I hope she doesn’t mind. This is what Gerda’s profile looks like on @racently.
Now there are a couple of things I should point out:
- This profile is far from complete. Gerda has run a lot more races than that. These are just the ones that we currently have in our database. We’re adding more races all the time, but it’s a long and arduous process.
- The result for the 2019 Comrades Marathon was when she won the race!
A view like this can be created for any runner on the system. Most runners in South Africa should have a profile (unless they have explicitly requested that we remove it!).
Pulling Data with the API
Supposing that you wanted to do some analytics on the data. You’d want to pull the data into R or Python. You could scrape the site, but the API makes it a lot easier to access the data.
Load up some helpful packages.
library(glue) library(dplyr) library(purrr) library(httr)
Set up the URL for the API endpoint and the key for Gerda’s profile.
URL = "https://www.racently.com/api/athlete/{key}/" key = "7ef6fbc8-4169-4a98-934e-ff5fa79ba103"
Send a GET request and extract the results from the response object, parsing the JSON into an R list.
response <- glue(URL) %>% GET() %>% content()
Extract some basic information from the response.
response$url ## [1] "http://www.racently.com/api/athlete/7ef6fbc8-4169-4a98-934e-ff5fa79ba103/" response$name ## [1] "Gerda Steyn" response$gender ## [1] "F"
Now get the race results. This requires a little more work because of the way that the JSON is structured: an array of licenses, each of which has a nested array of race result objects.
response$license %>% map_dfr(function(license) { license$result %>% map_dfr(as_tibble)} %>% mutate( club = license$club, number = license$number, date = as.Date(date) ) ) %>% arrange(desc(date)) ## date race distance time club number ## 1 2019-06-09 Comrades 86.8 km 05:58:53 Nedbank NA ## 2 2018-06-10 Comrades 90.2 km 06:15:34 Nedbank 8300 ## 3 2018-05-20 RAC 10.0 km 00:35:38 Nedbank 8300 ## 4 2018-05-01 Wally Hayward 10.0 km 00:35:35 Nedbank 8300 ## 5 2017-06-04 Comrades 86.7 km 06:45:45 Nedbank NA ## 6 2016-05-29 Comrades 89.2 km 07:08:23 Nedbank NA
For good measure, let’s throw in the results for @DanielCunnama.
## date race distance time club number ## 1 2019-09-29 Grape Run 21.1 km 01:27:49 Harfield Harriers 4900 ## 2 2019-06-09 Comrades 86.8 km 07:16:21 Harfield Harriers 4900 ## 3 2019-02-17 Cape Peninsula 42.2 km 03:08:47 Harfield Harriers 4900 ## 4 2019-01-26 Red Hill Marathon 36.0 km 02:52:55 Harfield Harriers 4900 ## 5 2019-01-13 Bay to Bay 30.0 km 02:15:55 Harfield Harriers 7935 ## 6 2018-11-10 Winelands 42.2 km 02:58:56 Harfield Harriers 7935 ## 7 2018-10-14 The Gun Run 21.1 km 01:22:30 Harfield Harriers 7935 ## 8 2018-10-07 Grape Run 21.1 km 01:36:46 Harfield Harriers 8358 ## 9 2018-09-23 Cape Town Marathon 42.2 km 03:11:52 Harfield Harriers 7935 ## 10 2018-09-09 Ommiedraai 10.0 km 00:37:46 Harfield Harriers 11167 ## 11 2018-06-10 Comrades 90.2 km 07:19:25 Harfield Harriers 7935 ## 12 2018-02-18 Cape Peninsula 42.2 km 03:08:27 Harfield Harriers 7935 ## 13 2018-01-14 Bay to Bay 30.0 km 02:11:50 Harfield Harriers 7935 ## 14 2017-10-01 Grape Run 21.1 km 01:27:18 Harfield Harriers 7088 ## 15 2017-09-17 Cape Town Marathon 42.2 km 02:57:55 Harfield Harriers 7088 ## 16 2017-06-04 Comrades 86.7 km 07:46:18 Harfield Harriers 7088 ## 17 2016-10-16 The Gun Run 21.1 km 01:19:09 Harfield Harriers NA ## 18 2016-09-10 Mont-Aux-Sources 50.0 km 05:42:23 Harfield Harriers NA ## 19 2016-05-29 Comrades 89.2 km 07:22:53 Harfield Harriers NA ## 20 2016-02-21 Cape Peninsula 42.2 km 03:17:12 Harfield Harriers NA
Wrapping Up
Let’s digress for a moment to look at a bubble plot showing the number of races on @racently broken down by runner. There are some really prolific runners.
We’ve currently got just under one million individual race results across over a thousand races. If you have the time and inclination then there’s definitely some interesting science to be done using these results. I’d be very interested in collaborating, so just shout if you are interested.
Feel free to grab some data via the API. At the moment you’ll need to search for an athlete on the main website in order to find their API key. I’ll implement some search functionality in the API when I get a chance.
Finally, here’s a talk I gave about @racently at the Bulgaria Web Summit (2017) in Sofia, Bulgaria. A great conference, incidentally. Well worth making the trip to Bulgaria.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.