Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
2012 Olympics Swimming – 100m Butterfly Men Finals prediction
Author: Matt Malin
Inspired by mages’ blog with predictions for 100m running times, I’ve decided to perform some basic modelling (loess and linear modelling) on previous Olympic results for the 100m Butterfly Men’s medal winning results.
Code setup
library(XML) library(ggplot2) swimming_path <- "http://www.databasesports.com/olympics/sport/sportevent.htm?sp=SWI&enum=200" swimming_data <- readHTMLTable( readLines(swimming_path), which = 3, stringsAsFactors = FALSE) # due to some potential errors in passing header = TRUE: names(swimming_data) <- swimming_data[1, ] swimming_data <- swimming_data[-1, ] swimming_data[["Result"]] <- as.numeric(swimming_data[["Result"]]) swimming_data[["Year"]] <- as.numeric(swimming_data[["Year"]]) swimming_data <- na.omit(swimming_data) loess_prediction <- function( medal_type = "GOLD", prediction_year = 2012) { medal_type <- toupper(medal_type) swimming_loess <- loess( Result ~ Year, subset(swimming_data, Medal == medal_type), control = loess.control(surface = "direct")) swimming_prediction <- predict( swimming_loess, data.frame(Year = prediction_year), se = FALSE) return(swimming_prediction) } log_lm_prediction <- function( medal_type = "GOLD", prediction_year = 2012) { medal_type <- toupper(medal_type) swimming_log_lm <- lm( log(Result) ~ Year, subset(swimming_data, Medal == medal_type)) swimming_prediction <- exp(predict( swimming_log_lm, data.frame(Year = prediction_year), se = FALSE)) return(swimming_prediction) } swimming_data <- rbind( data.frame( swimming_data[c("Year", "Medal", "Result")], type = "actual"), data.frame( Year = rep(2012, 3), Medal = c("GOLD", "SILVER", "BRONZE"), Result = c( loess_prediction("gold"), loess_prediction("silver"), loess_prediction("bronze")), type = rep("loess_prediction", 3))) medal_colours <- c( GOLD = rgb(201, 137, 16, maxColorValue = 255), SILVER = rgb(168, 168, 168, maxColorValue = 255), BRONZE = rgb(150, 90, 56, maxColorValue = 255)) swimming_plot <- ggplot( swimming_data, aes( x = Year, y = Result, colour = Medal, group = Medal)) + scale_x_continuous(limits = c(1968, 2012)) + geom_point() + stat_smooth( aes(fill = Medal), alpha = 0.25, data = subset(swimming_data, type = "actual"), fullrange = FALSE, method = loess) swimming_plot <- swimming_plot + scale_fill_manual(values = medal_colours) + scale_colour_manual(values = medal_colours) + theme_bw()
Predictions
I now use the functions loess_prediction and log_lm_prediction to estimate the times for the medal winning times.
Loess predictions
The gold prediction for 2012 is 49.7
seconds, for silver is 49.5
seconds, and for bronze is 50.2
seconds.
Linear modelling (of log results)
I’ve shown the code here for the calls to the linear modelling approach:
swimming_log_lm_gold <- log_lm_prediction("gold") swimming_log_lm_silver <- log_lm_prediction("silver") swimming_log_lm_bronze <- log_lm_prediction("bronze")
This gives the following times as predictions:
swimming_log_lm_gold ## 1 ## 50.23 swimming_log_lm_silver ## 1 ## 50.09 swimming_log_lm_bronze ## 1 ## 50.46
Loess prediction plot
The following is a plot of actual and predicted times, along with loess error setting as defaults from geom_smooth:
Notes
Note that because of the small difference between the silver and gold medal results at the 2008 olympics, the trend of improvement in silver exceeds that in the gold, so the prediction is that the silver time will be faster than the gold!
Also note that this takes into account no information about performance of athletes involved or changes in rules, such as being unable to use the swimsuits that were present in the last Olympics and largely attributed to improving performance, purely modelling from a few data points as an interesting exercise!
Final Summary
To summarise, the final predicted results using these methods are:
library(pander) predictions <- data.frame( Medal = c("Gold", "Silver", "Bronze"), Loess_prediction = c( loess_prediction("gold"), loess_prediction("silver"), loess_prediction("bronze")), Log_Linear_prediction = c( log_lm_prediction("gold"), log_lm_prediction("silver"), log_lm_prediction("bronze"))) pandoc.table(predictions)
Medal | Loess_prediction | Log_Linear_prediction |
---|---|---|
Gold | 49.69 | 50.23 |
Silver | 49.52 | 50.09 |
Bronze | 50.20 | 50.46 |
Obviously the predictions here are very crudely performed, especially given that it produces a faster time for a silver medal than for gold, but it’ll still be interesting to see what actually happens, and if it’ll be Michael Phelps yet again!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.