cricketr digs the Ashes!
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
In some circles the Ashes is considered the ‘mother of all cricketing battles’. But, being a staunch supporter of all things Indian, cricket or otherwise, I have to say that the Ashes pales in comparison against a India-Pakistan match. After all, what are a few frowns and raised eyebrows at the Ashes in comparison to the seething emotions and reckless exuberance of Indian fans.
Anyway, the Ashes are an interesting duel and I have decided to do some cricketing analysis using my R package cricketr. For this analysis I have chosen the top 2 batsman and top 2 bowlers from both the Australian and English sides.
Batsmen
- Steven Smith (Aus) – Innings – 58 , Ave: 58.52, Strike Rate: 55.90
- David Warner (Aus) – Innings – 76, Ave: 46.86, Strike Rate: 73.88
- Alistair Cook (Eng) – Innings – 208 , Ave: 46.62, Strike Rate: 46.33
- J E Root (Eng) – Innings – 53, Ave: 54.02, Strike Rate: 51.30
Bowlers
- Mitchell Johnson (Aus) – Innings-131, Wickets – 299, Econ Rate : 3.28
- Peter Siddle (Aus) – Innings – 104 , Wickets- 192, Econ Rate : 2.95
- James Anderson (Eng) – Innings – 199 , Wickets- 406, Econ Rate : 3.05
- Stuart Broad (Eng) – Innings – 148 , Wickets- 296, Econ Rate : 3.08
It is my opinion if any 2 of the 4 in either team click then they will be able to swing the match in favor of their team.
I have interspered the plots with a few comments. Feel free to draw your conclusions!
The analysis is included below. Note: This post has also been hosted at Rpubs as cricketr digs the Ashes!
You can also download this analysis as a PDF file from cricketr digs the Ashes!
library(devtools) install_github("tvganesh/cricketr") library(cricketr)
Analyses of Batsmen
The following plots gives the analysis of the 2 Australian and 2 English batsmen. It must be kept in mind that Cooks has more innings than all the rest put together. Smith has the best average, and Warner has the best strike rate
Box Histogram Plot
This plot shows a combined boxplot of the Runs ranges and a histogram of the Runs Frequency
batsmanPerfBoxHist("./smith.csv","S Smith")
batsmanPerfBoxHist("./warner.csv","D Warner")
batsmanPerfBoxHist("./cook.csv","A Cook")
batsmanPerfBoxHist("./root.csv","JE Root")
Plot os 4s, 6s and the type of dismissals
A. Steven Smith
par(mfrow=c(1,3)) par(mar=c(4,4,2,2)) batsman4s("./smith.csv","S Smith") batsman6s("./smith.csv","S Smith") batsmanDismissals("./smith.csv","S Smith")
dev.off() ## null device ## 1
B. David Warner
par(mfrow=c(1,3)) par(mar=c(4,4,2,2)) batsman4s("./warner.csv","D Warner") batsman6s("./warner.csv","D Warner") batsmanDismissals("./warner.csv","D Warner")
dev.off() ## null device ## 1
C. Alistair Cook
par(mfrow=c(1,3)) par(mar=c(4,4,2,2)) batsman4s("./cook.csv","A Cook") batsman6s("./cook.csv","A Cook") batsmanDismissals("./cook.csv","A Cook")
dev.off() ## null device ## 1
D. J E Root
par(mfrow=c(1,3)) par(mar=c(4,4,2,2)) batsman4s("./root.csv","JE Root") batsman6s("./root.csv","JE Root") batsmanDismissals("./root.csv","JE Root")
dev.off() ## null device ## 1
Relative Mean Strike Rate
In this first plot I plot the Mean Strike Rate of the batsmen. It can be Warner’s has the best strike rate (hit outside the plot!) followed by Smith in the range 20-100. Root has a good strike rate above hundred runs. Cook maintains a good strike rate.
par(mar=c(4,4,2,2)) frames <- list("./smith.csv","./warner.csv","cook.csv","root.csv") names <- list("Smith","Warner","Cook","Root") relativeBatsmanSR(frames,names)
Relative Runs Frequency Percentage
The plot below show the percentage contribution in each 10 runs bucket over the entire career.It can be seen that Smith pops up above the rest with remarkable regularity.COok is consistent over the entire range.
frames <- list("./smith.csv","./warner.csv","cook.csv","root.csv") names <- list("Smith","Warner","Cook","Root") relativeRunsFreqPerf(frames,names)
Moving Average of runs over career
The moving average for the 4 batsmen indicate the following 1. S Smith is the most promising. There is a marked spike in Performance. Cook maintains a steady pace and is consistent over the years averaging 50 over the years.
par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanMovingAverage("./smith.csv","S Smith") batsmanMovingAverage("./warner.csv","D Warner") batsmanMovingAverage("./cook.csv","A Cook") batsmanMovingAverage("./root.csv","JE Root")
dev.off() ## null device ## 1
Runs forecast
The forecast for the batsman is shown below. As before Cooks’s performance is really consistent across the years and the forecast is good for the years ahead. In Cook’s case it can be seen that the forecasted and actual runs are reasonably accurate
par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) batsmanPerfForecast("./smith.csv","S Smith") batsmanPerfForecast("./warner.csv","D Warner") batsmanPerfForecast("./cook.csv","A Cook") ## Warning in HoltWinters(ts.train): optimization difficulties: ERROR: ## ABNORMAL_TERMINATION_IN_LNSRCH batsmanPerfForecast("./root.csv","JE Root")
dev.off() ## null device ## 1
3D plot of Runs vs Balls Faced and Minutes at Crease
The plot is a scatter plot of Runs vs Balls faced and Minutes at Crease. A prediction plane is fitted
par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) battingPerf3d("./smith.csv","S Smith") battingPerf3d("./warner.csv","D Warner")
par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) battingPerf3d("./cook.csv","A Cook") battingPerf3d("./root.csv","JE Root")
dev.off() ## null device ## 1
Predicting Runs given Balls Faced and Minutes at Crease
A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.
BF <- seq( 10, 400,length=15) Mins <- seq(30,600,length=15) newDF <- data.frame(BF,Mins) smith <- batsmanRunsPredict("./smith.csv","S Smith",newdataframe=newDF) warner <- batsmanRunsPredict("./warner.csv","D Warner",newdataframe=newDF) cook <- batsmanRunsPredict("./cook.csv","A Cook",newdataframe=newDF) root <- batsmanRunsPredict("./root.csv","JE Root",newdataframe=newDF)
The fitted model is then used to predict the runs that the batsmen will score for a given Balls faced and Minutes at crease. It can be seen that Warner sets a searing pace in the predicted runs for a given Balls Faced and Minutes at crease while Smith and Root are neck to neck in the predicted runs
batsmen <-cbind(round(smith$Runs),round(warner$Runs),round(cook$Runs),round(root$Runs)) colnames(batsmen) <- c("Smith","Warner","Cook","Root") newDF <- data.frame(round(newDF$BF),round(newDF$Mins)) colnames(newDF) <- c("BallsFaced","MinsAtCrease") predictedRuns <- cbind(newDF,batsmen) predictedRuns ## BallsFaced MinsAtCrease Smith Warner Cook Root ## 1 10 30 9 12 6 9 ## 2 38 71 25 33 20 25 ## 3 66 111 42 53 33 42 ## 4 94 152 58 73 47 59 ## 5 121 193 75 93 60 75 ## 6 149 234 91 114 74 92 ## 7 177 274 108 134 88 109 ## 8 205 315 124 154 101 125 ## 9 233 356 141 174 115 142 ## 10 261 396 158 195 128 159 ## 11 289 437 174 215 142 175 ## 12 316 478 191 235 155 192 ## 13 344 519 207 255 169 208 ## 14 372 559 224 276 182 225 ## 15 400 600 240 296 196 242
Highest runs likelihood
The plots below the runs likelihood of batsman. This uses K-Means. It can be seen Smith has the best likelihood around 40% of scoring around 41 runs, followed by Root who has 28.3% likelihood of scoring around 81 runs
A. Steven Smith
batsmanRunsLikelihood("./smith.csv","S Smith") ## Summary of S Smith 's runs scoring likelihood ## ************************************************** ## ## There is a 40 % likelihood that S Smith will make 41 Runs in 73 balls over 101 Minutes ## There is a 36 % likelihood that S Smith will make 9 Runs in 21 balls over 27 Minutes ## There is a 24 % likelihood that S Smith will make 139 Runs in 237 balls over 338 Minutes
B. David Warner
batsmanRunsLikelihood("./warner.csv","D Warner") ## Summary of D Warner 's runs scoring likelihood ## ************************************************** ## ## There is a 11.11 % likelihood that D Warner will make 134 Runs in 159 balls over 263 Minutes ## There is a 63.89 % likelihood that D Warner will make 17 Runs in 25 balls over 37 Minutes ## There is a 25 % likelihood that D Warner will make 73 Runs in 105 balls over 156 Minutes
C. Alastair Cook
batsmanRunsLikelihood("./cook.csv","A Cook") ## Summary of A Cook 's runs scoring likelihood ## ************************************************** ## ## There is a 27.72 % likelihood that A Cook will make 64 Runs in 140 balls over 195 Minutes ## There is a 59.9 % likelihood that A Cook will make 15 Runs in 32 balls over 46 Minutes ## There is a 12.38 % likelihood that A Cook will make 141 Runs in 300 balls over 420 Minutes
D. J E Root
batsmanRunsLikelihood("./root.csv","JE Root") ## Summary of JE Root 's runs scoring likelihood ## ************************************************** ## ## There is a 28.3 % likelihood that JE Root will make 81 Runs in 158 balls over 223 Minutes ## There is a 7.55 % likelihood that JE Root will make 179 Runs in 290 balls over 425 Minutes ## There is a 64.15 % likelihood that JE Root will make 16 Runs in 39 balls over 59 Minutes
Average runs at ground and against opposition
A. Steven Smith
par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) batsmanAvgRunsGround("./smith.csv","S Smith") batsmanAvgRunsOpposition("./smith.csv","S Smith")
dev.off() ## null device ## 1
B. David Warner
par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) batsmanAvgRunsGround("./warner.csv","D Warner") batsmanAvgRunsOpposition("./warner.csv","D Warner")
dev.off() ## null device ## 1
C. Alistair Cook
par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) batsmanAvgRunsGround("./cook.csv","A Cook") batsmanAvgRunsOpposition("./cook.csv","A Cook")
dev.off() ## null device ## 1
D. J E Root
par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) batsmanAvgRunsGround("./root.csv","JE Root") batsmanAvgRunsOpposition("./root.csv","JE Root")
dev.off() ## null device ## 1
Analysis of bowlers
- Mitchell Johnson (Aus) – Innings-131, Wickets – 299, Econ Rate : 3.28
- Peter Siddle (Aus) – Innings – 104 , Wickets- 192, Econ Rate : 2.95
- James Anderson (Eng) – Innings – 199 , Wickets- 406, Econ Rate : 3.05
- Stuart Broad (Eng) – Innings – 148 , Wickets- 296, Econ Rate : 3.08
Anderson has the highest number of inning and wickets followed closely by Broad and Mitchell who are in a neck to neck race with respect to wickets. Johnson is on the more expensive side though. Siddle has fewer innings but a good economy rate.
Wicket Frequency percentage
This plot gives the percentage of wickets for each wickets (1,2,3…etc)
par(mfrow=c(1,4)) par(mar=c(4,4,2,2)) bowlerWktsFreqPercent("./johnson.csv","Johnson") bowlerWktsFreqPercent("./siddle.csv","Siddle") bowlerWktsFreqPercent("./broad.csv","Broad") bowlerWktsFreqPercent("./anderson.csv","Anderson")
dev.off() ## null device ## 1
Wickets Runs plot
The plot below gives a boxplot of the runs ranges for each of the wickets taken by the bowlers
par(mfrow=c(1,4)) par(mar=c(4,4,2,2)) bowlerWktsRunsPlot("./johnson.csv","Johnson") bowlerWktsRunsPlot("./siddle.csv","Siddle") bowlerWktsRunsPlot("./broad.csv","Broad") bowlerWktsRunsPlot("./anderson.csv","Anderson")
dev.off() ## null device ## 1
Average wickets in different grounds and opposition
A. Mitchell Johnson
par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) bowlerAvgWktsGround("./johnson.csv","Johnson") bowlerAvgWktsOpposition("./johnson.csv","Johnson")
dev.off() ## null device ## 1
B. Peter Siddle
par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) bowlerAvgWktsGround("./siddle.csv","Siddle") bowlerAvgWktsOpposition("./siddle.csv","Siddle")
dev.off() ## null device ## 1
C. Stuart Broad
par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) bowlerAvgWktsGround("./broad.csv","Broad") bowlerAvgWktsOpposition("./broad.csv","Broad")
dev.off() ## null device ## 1
D. James Anderson
par(mfrow=c(1,2)) par(mar=c(4,4,2,2)) bowlerAvgWktsGround("./anderson.csv","Anderson") bowlerAvgWktsOpposition("./anderson.csv","Anderson")
dev.off() ## null device ## 1
Relative bowling performance
The plot below shows that Mitchell Johnson is the mopst effective bowler among the lot with a higher wickets in the 3-6 wicket range. Broad and Anderson seem to perform well in 2 wickets in comparison to Siddle but in 3 wickets Siddle is better than Broad and Anderson.
frames <- list("./johnson.csv","./siddle.csv","broad.csv","anderson.csv") names <- list("Johnson","Siddle","Broad","Anderson") relativeBowlingPerf(frames,names)
Relative Economy Rate against wickets taken
Anderson followed by Siddle has the best economy rates. Johnson is fairly expensive in the 4-8 wicket range.
frames <- list("./johnson.csv","./siddle.csv","broad.csv","anderson.csv") names <- list("Johnson","Siddle","Broad","Anderson") relativeBowlingER(frames,names)
Moving average of wickets over career
Johnson is on his second peak while Siddle is on the decline with respect to bowling. Broad and Anderson show improving performance over the years.
par(mfrow=c(2,2)) par(mar=c(4,4,2,2)) bowlerMovingAverage("./johnson.csv","Johnson") bowlerMovingAverage("./siddle.csv","Siddle") bowlerMovingAverage("./broad.csv","Broad") bowlerMovingAverage("./anderson.csv","Anderson")
dev.off() ## null device ## 1
Key findings
Here are some key conclusions
- Cook has the most number of innings and has been extremly consistent in his scores
- Warner has the best strike rate among the lot followed by Smith and Root
- The moving average shows a marked improvement over the years for Smith
- Johnson is the most effective bowler but is fairly expensive
- Anderson has the best economy rate followed by Siddle
- Johnson is at his second peak with respect to bowling while Broad and Anderson maintain a steady line and length in their career bowling performance
Also see my other posts in R
- Introducing cricketr! : An R package to analyze performances of cricketers
- Taking cricketr for a spin – Part 1
- A peek into literacy in India: Statistical Learning with R
- A crime map of India in R – Crimes against women
- Analyzing cricket’s batting legends – Through the mirage with R
- Masters of Spin: Unraveling the web with R
- Mirror, mirror . the best batsman of them all?
You may also like
- A crime map of India in R: Crimes against women
- What’s up Watson? Using IBM Watson’s QAAPI with Bluemix, NodeExpress – Part 1
- Bend it like Bluemix, MongoDB with autoscaling – Part 2
- Informed choices through Machine Learning : Analyzing Kohli, Tendulkar and Dravid
- Thinking Web Scale (TWS-3): Map-Reduce – Bring compute to data
- Deblurring with OpenCV:Weiner filter reloaded
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.