A Chart of Recent Comrades Marathon Winners

[This article was first published on Exegetic Analytics » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Continuing on my quest to document the Comrades Marathon results, today I have put together a chart showing the winners of both the men and ladies races since 1980. Click on the image below to see a larger version.

winners-scatterchart

The analysis started off with the same data set that I was working with before, from which I extracted only the records for the winners.

> winners = subset(results, gender.position == 1, select = c(year, name, gender, race.time))
> head(winners)
     year               name gender race.time
1    1980          Alan Robb   Male  05:38:25
428  1980 Isavel Roche-Kelly Female  07:18:00
3981 1981      Bruce Fordyce   Male  05:37:28
4055 1981 Isavel Roche-Kelly Female  06:44:35
7643 1982      Bruce Fordyce   Male  05:34:22
7873 1982        Cheryl Winn Female  07:04:59

I then added in a field which gives a count of the number of times each person won the race.

> library(plyr)
> winners = ddply(winners, .(name), function(df) {
+     df = df[order(df$year),]
+     df$count = 1:nrow(df)
+     return(df)
+ })
> subset(winners, name == "Bruce Fordyce")
   year          name gender race.time count
7  1981 Bruce Fordyce   Male  05:37:28     1
8  1982 Bruce Fordyce   Male  05:34:22     2
9  1983 Bruce Fordyce   Male  05:30:12     3
10 1984 Bruce Fordyce   Male  05:27:18     4
11 1985 Bruce Fordyce   Male  05:37:01     5
12 1986 Bruce Fordyce   Male  05:24:07     6
13 1987 Bruce Fordyce   Male  05:37:01     7
14 1988 Bruce Fordyce   Male  05:27:42     8
15 1990 Bruce Fordyce   Male  05:40:25     9

The chart was generated as a scatter plot using ggplot2. The size of the points relates to the number of times each person won the race. The colour scale is as you might imagine: pink for the ladies and blue for the men.

> library(ggplot2)
> ggplot(winners, aes(x = year, y = name, color = gender)) +
+     geom_point(aes(size = count), shape = 19, alpha = 0.75) +
+     scale_size_continuous(range = c(5, 15)) +
+     ylab("") + xlab("") +
+     scale_x_discrete(expand = c(0, 1)) +
+     theme(
+         axis.text.x = element_text(angle = 45, hjust = 1, colour = "black"),
+         axis.text.y = element_text(colour = "black"),
+         legend.position = "none",
+         panel.background = element_blank(),
+         panel.grid.major = element_line(linetype = "dotted", colour = "grey"),
+         panel.grid.major.x = element_blank()
+         )

Two of the key aspects of getting this to look just right were:

  • the call to scale_size_continuous() which ensured that a reasonable range of point sizes was used and
  • the call to scale_x_discrete() which expanded the plot very slightly so that the points near the borders were not cropped.

To leave a comment for the author, please follow the link and comment on their blog: Exegetic Analytics » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)