Sports Data and R – Scope for a Thematic (Rather than Task) View? (Living Post)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Via my feeds, I noticed a package announcement today for cricketR!, a new package for analysing cricket performance data.
This got me wondering (again!) about what other sports related packages there might be out there, either in terms of functional thematic packages (to do with sport in general, or one sport in particular), or particular data packages, that either bundle up sports related data sets, or provide and API (that is, a wrapper for an official API, or a wrapper for a scraper that extracts data from one or more websites in a slightly scruffier way!)
This is just a first quick attempt, an unstructured listing that may also include data sets that are more generic than R-specific (eg CSV datafiles, or SQL database exports). I’ll try to keep this post updated as I find/hear about more packages, and also work a bit more on structuring it a little better. I really should pist this as a wiki somewhere – or perhaps curate something on Github?
- generic:
- SportsAnalytics [CRAN]: “infrastructure for sports analysis. Anyway, currently it is a selection of data sets, functions to fetch sports data, examples, and demos”.
- athletics:
- olympic {ade4} [Inside-R packages]: “performances of 33 men’s decathlon at the Olympic Games (1988)”.
- baseball:
- Lahman [R-Forge, CRAN]: Sean Lahman’s databases, “contains pitching, hitting, and fielding statistics for Major League Baseball from 1871 through 2013.” See also: Lahman: A New R Package for Baseball Stats.
- pitchRx [CRAN; Github: about, code]: “tools for collecting Major League Baseball (MLB) Gameday data”. pitchRx also “provides an easy and robust way to generate strike-zone plots using the ggplot2 package”. See also: Taming PITCHf/x Data with XML2R and pitchRx [R-Journal]
- retrosheets parser: “baseball runs created stats and play by play reader”
- basketball:
- Going Under the Hood of the NCAA Tournament Visualization: “a guide [to] collect[ing], analyz[ing], and present[ing] the data”.
- cricket:
- cricketr! [Github: about, code]: “statistics info available in ESPN Cricinfo Statsguru”. About: Introducing cricketr!: An R package to analyze performances of cricketers
- football (soccer):
- engsoccerdata [Github]: “a repository for complete soccer datasets, along with some built-in functions for analyzing parts of the data. Currently I include three English ones (League data, FA Cup data, Playoff data – described below) and some European leagues (Spain, Germany, Italy, Holland). Updates in the near future will include those for various other European leagues as well as MLS.”. Citation: James P. Curley (2015). engsoccerdata: English Soccer Data 1871-2015. R package version 0.1.4
- UKSoccer {vcd} [Inside-R packages]: data “on the goals scored by Home and Away teams in the Premier Football League, 1995/6 season.”.
- Soccer {PASWR} [Inside-R packages]: “how many goals were scored in the regulation 90 minute periods of World Cup soccer matches from 1990 to 2002″.
- golf:
- gymnastics:
- motor sport:
- NASCAR Winston Cup Race Results for 1975-2003 [Journal of Statistics Education dataset]: data at the race and driver/race level levels.
- ergastR under construction [code fragments: my own fumblings at an R wrapper for the ergast motor racing database online API. (See also: ergast data download)
- Formula E: see ergast motor racing database
- snooker:
- swimming:
- tennis:
- tennis_MatchChartingProject: “The goal of the Match Charting Project (MCP) is to amass detailed records of professional matches.”.
It would perhaps make more sense to try to collect rather more structured (meta)data for each package. For example: homepage, sport/discipline; analysis, data (package or API), or analysis and data; if data: year-range, source, data coverage (e.g. table column headings); if analysis, brief synopsis of tools available (e.g. chart generators).
If you know of any others, please let me know via the comments and I’ll try to keep this page updated with a reasonably current list.
As well as packages, here are some links to blog posts that look at sports data analysis using R:
- Analyzing Baseball Data with R [book]; [supporting data/code]. See also Jim Albert [homepage].
- Exploring Baseball Data with R [blog]
- Wrangling F1 Data With R [Leanpub book] Disclaimer: I wrote this
- Scraping and Analyzing Baseball Data with R [blogpost]
- OUseful.info – F1 datajunkie [blog topic feed]. Disclaimer: my blog feed
- Reovlutions blog – sports tag
Again, if you can recommend further posts, please let me know via the comments.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.