Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Cricket is a sport that generates a large volume of performance data and corresponding debate about the relative qualities of various players over their careers and in relation to their contemporaries. The cricinfo website has an extensive database of statistics for professional cricketers that can be searched to access the information in various formats.
As an initial example we will consider the English legend Sir Ian Botham who played 102 test matches for England between his debut in 1977 until his final game in 1992.
The first obvious breakdown is to consider how Botham performed against the six countries that he played against during his test career. A summary of his statistics are shown here:
Opposition Matches Bat Inns Runs NO Bowl Inns Wicket Catch Australia 36 49 1673 2 66 148 57 India 14 16 1201 0 23 59 14 New Zealand 15 22 846 2 28 64 14 Pakistan 14 20 647 1 18 40 14 Sri Lanka 3 3 41 0 6 11 2 West Indies 20 37 792 1 27 61 19
Botham only played three matches against Sri Lanka so it is difficult to properly assess his performance against them. If the above table is stored in a data frame itb.opp then we can create a histogram of the total runs (or wickets) by opposition country:
ggplot(itb.opp, aes(Opposition, Runs)) + geom_bar() + xlab("Country") + ylab("Total Runs")
This code produces the following graph:
The total wickes graph is produced by the next code:
ggplot(itb.opp, aes(Opposition, Wicket)) + geom_bar() + xlab("Country") + ylab("Total Wickets")
We may now want to delve deeper into the performance against different nations to take into account the number of games or innings where Botham batted or bowled. The traditional way to assess performance is to calculate batting and bowling averages and we can do this by opposition which provides the following data frame:
> itb.opp.sum Opposition Discipline Average Australia Batting 29.35088 India Batting 70.64706 New Zealand Batting 42.30000 Pakistan Batting 32.35000 Sri Lanka Batting 13.66667 West Indies Batting 21.40541 Australia Bowling 27.65541 India Bowling 26.40678 New Zealand Bowling 23.43750 Pakistan Bowling 31.77500 Sri Lanka Bowling 28.18182 West Indies Bowling 35.18033
This can be converted into a dot plot so we can see whether Botham had a high batting average than bowling average, which is often taken to be one of the signs of an all-rounder.
ggplot(itb.opp.sum, aes(Average, Opposition, colour = Discipline)) + geom_point()+ xlab("Average") + ylab("")
The graph is shown here:
We can see the differences in performance based on the opposition. Botham’s performance against the West Indies, by far the strongest team during most of his international career, were worse than against the other countries. However, his averages were far from embarassing when compared to other players at the time. The graph also shows that Botham enjoyed batting and bowling against India.
We can divide this data further based on whether the matches were played in England or outside of England and this data is shown here:
> itb.opp.ha.sum Opposition Venue Discipline Average Australia Away Batting 30.22581 India Away Batting 61.55556 New Zealand Away Batting 50.44444 Pakistan Away Batting 16.00000 Sri Lanka Away Batting 13.00000 West Indies Away Batting 14.17647 Australia Home Batting 28.30769 India Home Batting 80.87500 New Zealand Home Batting 35.63636 Pakistan Home Batting 34.16667 Sri Lanka Home Batting 14.00000 West Indies Home Batting 27.55000 Australia Away Bowling 28.44928 India Away Bowling 25.53333 New Zealand Away Bowling 27.44444 Pakistan Away Bowling 45.00000 Sri Lanka Away Bowling 21.66667 West Indies Away Bowling 39.50000 Australia Home Bowling 26.96203 India Home Bowling 27.31034 New Zealand Home Bowling 20.51351 Pakistan Home Bowling 31.07895 Sri Lanka Home Bowling 30.62500 West Indies Home Bowling 31.97143
A dot plot is created from this data with a separate panel for each of the six opposition countries and the averages divided into batting and bowling performances. The coloured dots in the graph indicated whether the average is for matches at home or away.
ggplot(itb.opp.ha.sum, aes(Average, Discipline, colour = Venue)) + geom_point() + facet_wrap( ~ Opposition) + xlab("Batting Average") + ylab("")
This graph is shown below:
We can see that the difference between home and away peformance is, in general, not very large for bowling averages but in some cases there is a noticeable difference in batting averages. When looking at Botham’s performances against the West Indies his statistics at home are much better than his away performance, suggesting that his main struggles against the strong West Indies team were in the Caribbean. This might be due to his swing bowling being more suitable to English conditions compared to pitches in the West Indies.
To round off this brief look at the career of IT Botham let us consider some other important statistics, in particular games where he performed with the bat and ball.
- Overall Botham scored 14 hundreds and 22 fifties out of 161 innings so he reached fifty runs every five innings or so.
- He also took 27 five wicket hauls and 17 four wicket hauls so he took four or more wickets every four innings or so.
- He took 120 catches.
Individual matches of excellence include five games with a century and at least five wickets:
Year Opposition Ground Venue Runs Wicket 1978 New Zealand Christchurch Away 133 8 1978 Pakistan Lord's Home 108 8 1980 India Mumbai Away 114 13 1981 Australia Leeds Home 199 7 1984 New Zealand Wellington Away 138 6
These performances and others show why Botham was considered such a great player as he produced some sustained periods of excellent all-round cricket rather than having one discipline more dominant for a long period of time.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.