Site icon R-bloggers

The cricketdata package

[This article was first published on R on Rob J Hyndman, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Four functions

The cricketdata package has been around for a few years on github, and it has been on CRAN since February 2022. There are only four functions:

Jacquie Tran wrote the first version of the fetch_cricsheet() function, and the vignette which demonstrates it.

Here are some examples demonstrating the Cricinfo functions.

library(cricketdata)
library(tidyverse)

Women’s T20 bowling data

The fetch_cricinfo() function downloads data for international T20, ODI or Test matches, for men or women, and for batting, bowling or fielding. By default, it downloads career-level statistics for individual players. Here is an example for women T20 bowlers.

# Fetch all Women's T20 data
wt20 <- fetch_cricinfo("T20", "Women", "Bowling")

wt20 %>%
  select(Player, Country, Matches, Runs, Wickets, Economy, StrikeRate)
#> # A tibble: 1,697 × 7
#>    Player       Country      Matches  Runs Wickets Economy StrikeRate
#>    <chr>        <chr>          <int> <int>   <int>   <dbl>      <dbl>
#>  1 A Mohammed   West Indies      117  2206     125    5.58       19.0
#>  2 EA Perry     Australia        126  2237     115    5.87       19.9
#>  3 S Ismail     South Africa      99  2015     111    5.79       18.8
#>  4 Nida Dar     Pakistan         111  1909     105    5.34       20.4
#>  5 A Shrubsole  England           79  1587     102    5.96       15.7
#>  6 KH Brunt     England           96  1888      98    5.54       20.9
#>  7 SFM Devine   New Zealand      102  1733      98    6.35       16.7
#>  8 Poonam Yadav India             72  1495      98    5.75       15.9
#>  9 SR Taylor    West Indies      111  1639      98    5.66       17.7
#> 10 M Schutt     Australia         75  1510      96    6.08       15.5
#> # … with 1,687 more rows

We can plot a bowler’s strike rate (balls per wicket) vs economy rate (runs per wicket). Each observation represents one player, who has taken at least 50 international wickets.

wt20 %>%
  filter(Wickets >= 50) %>%
  ggplot(aes(y = StrikeRate, x = Average)) +
  geom_point(alpha = 0.3, col = "blue") +
  ggtitle("Women International T20 Bowlers") +
  ylab("Balls per wicket") + xlab("Runs per wicket")

< !-- -->

The extraordinary result on the bottom left is due to the Thai all-rounder, Nattaya Boochatham, who has taken 59 wickets, with a strike rate of 13.475, an average of 8.78, and an economy rate of 3.909.

Australian men’s ODI data by innings

The next example shows Australian men’s ODI batting results by innings.

# Fetch all Australian Men's ODI data by innings
menODI <- fetch_cricinfo("ODI", "Men", "Batting", type = "innings", country = "Australia")

menODI %>%
  select(Date, Player, Runs, StrikeRate, NotOut)
#> # A tibble: 10,587 × 5
#>    Date       Player        Runs StrikeRate NotOut
#>    <date>     <chr>        <int>      <dbl> <lgl>
#>  1 2011-04-11 SR Watson      185       193. TRUE
#>  2 2007-02-20 ML Hayden      181       109. TRUE
#>  3 2017-01-26 DA Warner      179       140. FALSE
#>  4 2015-03-04 DA Warner      178       134. FALSE
#>  5 2001-02-09 ME Waugh       173       117. FALSE
#>  6 2016-10-12 DA Warner      173       127. FALSE
#>  7 2004-01-16 AC Gilchrist   172       137. FALSE
#>  8 2019-06-20 DA Warner      166       113. FALSE
#>  9 2006-03-12 RT Ponting     164       156. FALSE
#> 10 2016-12-04 SPD Smith      164       104. FALSE
#> # … with 10,577 more rows

menODI %>%
  ggplot(aes(y = Runs, x = Date)) +
  geom_point(alpha = 0.2, col = "#D55E00") +
  geom_smooth() +
  ggtitle("Australia Men ODI: Runs per Innings")

< !-- -->

The average number of runs per innings slowly increased until about 2000, after which it has remained largely constant at about 35.2. This is a little higher than the smooth line shown on the plot, which has not taken account of not-out results.

Indian test fielding data

Next, we demonstrate some of the fielding data available, using Test match fielding from Indian men’s players.

Indfielding <- fetch_cricinfo("Test", "Men", "Fielding", country = "India")

Indfielding
#> # A tibble: 303 × 11
#>    Player       Start   End Matches Innings Dismissals Caught CaughtFielder
#>    <chr>        <int> <int>   <int>   <int>      <int>  <int>         <int>
#>  1 MS Dhoni      2005  2014      90     166        294    256             0
#>  2 R Dravid      1996  2012     163     299        209    209           209
#>  3 SMH Kirmani   1976  1986      88     151        198    160             0
#>  4 VVS Laxman    1996  2012     134     248        135    135           135
#>  5 KS More       1986  1993      49      90        130    110             0
#>  6 RR Pant       2018  2022      30      59        118    107             0
#>  7 SR Tendulkar  1989  2013     200     366        115    115           115
#>  8 SM Gavaskar   1971  1987     125     216        108    108           108
#>  9 NR Mongia     1994  2001      44      77        107     99             0
#> 10 M Azharuddin  1984  2000      99     177        105    105           105
#> # … with 293 more rows, and 3 more variables: CaughtBehind <int>,
#> #   Stumped <int>, MaxDismissalsInnings <dbl>

We can plot the number of dismissals by number of matches for all male test players. Because wicket keepers typically have a lot more dismissals than other players, they are shown in a different colour.

Indfielding %>%
  mutate(wktkeeper = (CaughtBehind > 0) | (Stumped > 0)) %>%
  ggplot(aes(x = Matches, y = Dismissals, col = wktkeeper)) +
  geom_point() +
  ggtitle("Indian Men Test Fielding")

< !-- -->

The high number of dismissals, close to 300, is of course due to MS Dhoni. Another interesting one here is the non-wicketkeeper with over 200 dismissals, which is Rahul Dravid who took 209 catches during his career.

Meg Lanning’s ODI batting

Finally, let’s look at individual player data. The fetch_player_data() requires the Cricinfo player ID, which you can either look up on their website, or use the find_player_id() function. We will look at the ODI results of Australia’s captain, Meg Lanning.

meg_lanning_id <- find_player_id("Meg Lanning")$ID
MegLanning <- fetch_player_data(meg_lanning_id, "ODI") %>%
  mutate(NotOut = (Dismissal == "not out"))

MegLanning
#> # A tibble: 100 × 15
#>    Date       Innings Opposition Ground  Runs  Mins    BF   X4s   X6s    SR
#>    <date>       <int> <chr>      <chr>  <dbl> <dbl> <int> <int> <int> <dbl>
#>  1 2011-01-05       1 ENG Women  Perth     20    60    38     2     0  52.6
#>  2 2011-01-07       2 ENG Women  Perth    104   148   118     8     1  88.1
#>  3 2011-06-14       2 NZ Women   Brisb…    11    15    14     2     0  78.6
#>  4 2011-06-16       1 NZ Women   Brisb…     5     8     8     1     0  62.5
#>  5 2011-06-30       1 NZ Women   Chest…    17    24    20     3     0  85
#>  6 2011-07-02       2 India Wom… Chest…    23    40    32     3     0  71.9
#>  7 2011-07-05       2 ENG Women  Lord's    43    40    33     9     0 130.
#>  8 2011-07-07       2 ENG Women  Worms…     0     2     3     0     0   0
#>  9 2012-03-12       1 India Wom… Ahmed…    45    61    44     7     0 102.
#> 10 2012-03-14       1 India Wom… Wankh…   128   125   104    19     1 123.
#> # … with 90 more rows, and 5 more variables: Pos <int>, Dismissal <chr>,
#> #   Inns <int>, Start_Date <chr>, NotOut <lgl>

We can plot her runs per innings on the vertical axis over time on the horizontal axis.

# Compute batting average
MLave <- MegLanning %>%
  filter(!is.na(Runs)) %>%
  summarise(Average = sum(Runs) / (n() - sum(NotOut))) %>%
  pull(Average)
names(MLave) <- paste("Average =", round(MLave, 2))
# Plot ODI scores
ggplot(MegLanning) +
  geom_hline(aes(yintercept = MLave), col="gray") +
  geom_point(aes(x = Date, y = Runs, col = NotOut)) +
  ggtitle("Meg Lanning ODI Scores") +
  scale_y_continuous(sec.axis = sec_axis(~., breaks = MLave))

< !-- -->

She has shown amazing consistency over her career, with centuries scored in every year of her career except for 2021, when her highest score from 6 matches was 53.

To leave a comment for the author, please follow the link and comment on their blog: R on Rob J Hyndman.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.