Cricketr analyzes Ind-Aus faceoff in WTC 2023!!
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
“The unexamined life is not worth living.” – Socrates
“There is no easy way from the earth to the stars.” – Seneca
“If you want to go fast, go alone. If you want to go far, go together.” – African Proverb
1. Introduction
In this post, I put my R package cricketr to analyze the Indian and Australia World Test Championship (WTC) final squad ahead of the World Test Championship 2023.My R package cricketr had its birth on Jul 4, 2015. Cricketr uses data from Cricinfo.
Indian squad
Rohit Sharma (Captain), Shubman Gill, Cheteshwar Pujara, Virat Kohli, Ajinkya Rahane, Ravindra Jadeja, Shardul Thakur, Mohd. Shami, Mohd. Siraj, Ishan Kishan (wk).
According to me, Ishan Kishan has more experience than KS Bharat, though Rishabh Pant would have been the ideal wicket keeper/left-handed batsman. I think Shardul Thakur would be handful in the English conditions. For a spinner it either Ashwin or Jadeja. Maybe the balance shifts in favor of Jadeja
Australian squad
Pat Cummins (capt), Alex Carey (wk), Cameron Green, Josh Hazlewood, Usman Khawaja, Marnus Labuschagne, Nathan Lyon, Todd Murphy, Steven Smith (vice-capt), Mitchell Starc, David Warner.
Not sure if Scott Boland would fill in, instead of Todd Murphy 1
Let me give you a lay-of-the-land (post) below
The post below is organized into the following parts
- Analysis of Indian WTC batsmen from Jan 2016 – May 2023
- Analysis of Indian WTC batsmen against Australia from Jan 2016 -May 2023
- Analysis of Australian WTC batsmen from Jan 2016 – May 2023
- Analysis of Australian WTC batsmen against India from Jan 2016 -May 2023
- Analysis of Indian WTC bowlers from Jan 2016 – May 2023
- Analysis of Indian WTC bowlers against Australia from Jan 2016 -May 2023
- Analysis of Australian WTC bowlers from Jan 2016 – May 2023
- Analysis of Australian WTC bowlers gainst India from Jan 2016 -May 2023
- Team analysis of India and Australia
All the above analysis use data from ESPN Statsguru and use my R pakage cricketr
The data for the different players have been obtained using calls such as the ones below.
# Get Shubman Gill's batting data #shubman <-getPlayerData(1070173,dir=".",file="shubman.csv",type="batting",homeOrAway=c(1,2), result=c(1,2,4)) #shubmansp <- getPlayerDataSp(1070173,tdir=".",tfile="shubmansp.csv",ttype="batting") #Get Shubman Gill's data from Jan 2016 - May 2023 #df <-getPlayerDataHA(1070173,tfile="shubman1.csv",type="batting", matchType="Test") #df1=getPlayerDataOppnHA(infile="shubman1.csv",outfile="shubmanTestAus.csv",startDate="2016-01-01",endDate="2023-05-01") #Get Shubman Gills data from Jan 2016 - May 2023, against Australia #df <-getPlayerDataHA(1070173,tfile="shubman1.csv",type="batting", matchType="Test") #df1=getPlayerDataOppnHA(infile="shubman1.csv",outfile="shubmanTestAus.csv",opposition="Australia",startDate="2016-01-01",endDate="2023-05-01")
Note: To get data for bowlers we need to use the corresponding profile no and use type =‘bowling’. Details in my posts below
To do similar analysis please go through the following posts
- Re-introducing cricketr! : An R package to analyze performances of cricketers
- Cricketr learns new tricks : Performs fine-grained analysis of players
- Cricketr adds team analytics to its repertoire!!!
Note 1: I will not be analysing each and every chart as the charts are quite self-explanatory
Note 2: I have had to tile charts together otherwise this will become a very, very long post. You are free to use my R package cricketr and check out for yourself ##3. Analysis of India WTC batsmen from Jan 2016 – May 2023
Findings
- Kohli has the best average of 48+. India has won when Rohit and Rahane played well
- Kohli’s tops the list in cumulative average runs, followed by Pujara and Rohit is 3rd. Gill is on the upswing.
- Against Australia Pujara has the best cumulative average runs record followed by Rahane, with Gill in hot pursuit. In the strike rate department Gill tops followed by Rohit and Rahane
- Since 2016 Smith, Labuschagne has an average of 53+ since 2016!! Warner & Khwaja are at ~46
- Australia has won matches when Smith, Warner and Khwaja have played well.
- Labuschagne, Smith and C Green have good records against India. Indian bowlers will need to contain them
- Ashwin has the highest wickets followed by Jadeja against all teams. Ashwin’s performance has dropped over the years, while Siraj has been becoming better
- Jadeja has the best economy rate followed by Ashwin
- Against Australia specifically Jadeja has the best record followed by Ashwin. Jadeja has the best economy against Australia, followed by Siraj, then Ashwin
- Cummins, Starc and Lyons are the best performers for Australia. Hazzlewood, Cummins have the best economy against all opposition
- Against India Lyon, Cummins and Hazzlewood have performed well
- Hazzlewood, Lyon have a good economy rate against India
- Against Australia India has won 17 times, lost 60 and drawn 22 in Australia. At home India won 42, tied 2, lost 28 and drawn 24
- At the Oval where the World Test Championship is going to be held India has won 4, lost 10 and drawn 10.
Note 3: You can also read this post at Rpubs at ind-aus-WTC!! The formatting will be nicer!
Note 4: You can download this post as PDF to read at your leisure ind-aus-WTC.pdf
2. Install the cricketr package
if (!require("cricketr")){
    install.packages("cricketr",lib = "c:/test")
}
library(cricketr)
3a. Basic analysis
The analyses below include – Runs frequency plot – Mean strike rate – Run Ranges
Kohli’s strike rate increases with increasing runs, while Gill’s seems to drop. So it is with Pujara & Rahane
par(mfrow=c(3,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("kohliTest.csv","Kohli")
batsmanMeanStrikeRate("kohliTest.csv","Kohli")
batsmanRunsRanges("kohliTest.csv","Kohli")
batsmanRunsFreqPerf("rohitTest.csv","Rohit")
batsmanMeanStrikeRate("rohitTest.csv","Rohit")
batsmanRunsRanges("rohitTest.csv","Rohit")
batsmanRunsFreqPerf("shubmanTest.csv","S Gill")
batsmanMeanStrikeRate("shubmanTest.csv","S Gill")
batsmanRunsRanges("shubmanTest.csv","S Gill")

par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("rahaneTest.csv","Rahane")
batsmanMeanStrikeRate("rahaneTest.csv","Rahane")
batsmanRunsRanges("rahaneTest.csv","Rahane")
batsmanRunsFreqPerf("pujaraTest.csv","Pujara")
batsmanMeanStrikeRate("pujaraTest.csv","Pujara")
batsmanRunsRanges("pujaraTest.csv","Pujara")

3b. More analyses
Kohli hits roughly 5 4s in his 50 versus Gill,Pujara who is able to smash 6 4s.
par(mfrow=c(3,3))
par(mar=c(4,4,2,2))
batsman4s("kohliTest.csv","Kohli")
batsman6s("kohliTest.csv","Kohli")
batsmanMeanStrikeRate("kohliTest.csv","Kohli")
batsman4s("rohitTest.csv","Rohit")
batsman6s("rohitTest.csv","Rohit")
batsmanMeanStrikeRate("rohitTest.csv","Rohit")
batsman4s("shubmanTest.csv","S Gill")
batsman6s("shubmanTest.csv","S Gill")
batsmanMeanStrikeRate("shubmanTest.csv","S Gill")

par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
batsman4s("rahaneTest.csv","Rahane")
batsman6s("rahaneTest.csv","Rahane")
batsmanMeanStrikeRate("rahane.csv","Rahane")
batsman4s("pujaraTest.csv","Pujara")
batsman6s("pujaraTest.csv","Pujara")
batsmanMeanStrikeRate("pujaraTest.csv","Pujara")

3c.Boxplot histogram plot
This plot shows a combined boxplot of the Runs ranges and a histog2ram of the Runs Frequency Kohli’s average is 48, while Rohit,Pujara is 40 with Rahane and Gill around 33.
batsmanPerfBoxHist("kohliTest.csv","Kohli")

batsmanPerfBoxHist("rohitTest.csv","Rohit")

batsmanPerfBoxHist("shubmanTest.csv","S Gill")

batsmanPerfBoxHist("rahaneTest.csv","Rahane")

batsmanPerfBoxHist("pujaraTest.csv","Pujara")

3d. Contribution to won and lost matches
For the functions below you will have to use the getPlayerDataSp() function. When Rohit Sharma and Pujara have played well India have tended to win more often
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanContributionWonLost("kohlisp.csv","Kohli")
batsmanContributionWonLost("rohitsp.csv","Rohit")
batsmanContributionWonLost("rahanesp.csv","Rahane")
batsmanContributionWonLost("pujarasp.csv","Pujara")

3e. Performance at home and overseas
This function also requires the use of getPlayerDataSp() as shown above. This can only be used for Test matches
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfHomeAway("kohlisp.csv","Kohli")
batsmanPerfHomeAway("rohitsp.csv","Rohit")
batsmanPerfHomeAway("rahanesp.csv","Rahane")
batsmanPerfHomeAway("pujarasp.csv","Pujara")

3f. Batsman average at different venues
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("kohliTest.csv","Kohli")
batsmanAvgRunsGround("rohitTest.csv","Rohit")
batsmanAvgRunsGround("rahaneTest.csv","Rahane")
batsmanAvgRunsGround("pujaraTest.csv","Pujara")

3g. Batsman average against different opposition
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsOpposition("kohliTest.csv","Kohli")
batsmanAvgRunsOpposition("rohitTest.csv","Rohit")
batsmanAvgRunsOpposition("rahaneTest.csv","Rahane")
batsmanAvgRunsOpposition("pujaraTest.csv","Pujara")

3h. Runs Likelihood of batsman
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanRunsLikelihood("kohli.csv","Kohli")
## Summary of  Kohli 's runs scoring likelihood
## **************************************************
## 
## There is a 52.91 % likelihood that Kohli  will make  12 Runs in  26 balls over 35  Minutes 
## There is a 30.81 % likelihood that Kohli  will make  52 Runs in  100 balls over  139  Minutes 
## There is a 16.28 % likelihood that Kohli  will make  142 Runs in  237 balls over 335  Minutes
batsmanRunsLikelihood("rohit.csv","Rohit")
## Summary of  Rohit 's runs scoring likelihood
## **************************************************
## 
## There is a 43.24 % likelihood that Rohit  will make  10 Runs in  21 balls over 32  Minutes 
## There is a 45.95 % likelihood that Rohit  will make  46 Runs in  85 balls over  124  Minutes 
## There is a 10.81 % likelihood that Rohit  will make  110 Runs in  199 balls over 282  Minutes
batsmanRunsLikelihood("rahane.csv","Rahane")
## Summary of  Rahane 's runs scoring likelihood
## **************************************************
## 
## There is a 7.75 % likelihood that Rahane  will make  124 Runs in  224 balls over 318  Minutes 
## There is a 62.02 % likelihood that Rahane  will make  12 Runs in  26 balls over  37  Minutes 
## There is a 30.23 % likelihood that Rahane  will make  55 Runs in  113 balls over 162  Minutes
batsmanRunsLikelihood("pujara.csv","Pujara")
## Summary of  Pujara 's runs scoring likelihood
## **************************************************
## 
## There is a 60.49 % likelihood that Pujara  will make  15 Runs in  38 balls over 55  Minutes 
## There is a 31.48 % likelihood that Pujara  will make  62 Runs in  142 balls over  204  Minutes 
## There is a 8.02 % likelihood that Pujara  will make  153 Runs in  319 balls over 445  Minutes

3h1. Moving average of batsman
Kohli’s moving average in tests seem to havw dropped after a peak in 2017, 2018. So it is with Rahane
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("kohli.csv","Kohli")
batsmanMovingAverage("rohit.csv","Rohit")
batsmanMovingAverage("rahane.csv","Rahane")
batsmanMovingAverage("pujara.csv","Pujara")

3i. Cumulative Average runs of batsman in career
Kohli’s cumulative average averages to ~48. Shubman Gill’s cumulative average is on the rise.
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeAverageRuns("kohliTest.csv","Kohli")

batsmanCumulativeAverageRuns("rohitTest.csv","Rohit")

batsmanCumulativeAverageRuns("rahaneTest.csv","Rahane")

batsmanCumulativeAverageRuns("pujaraTest.csv","Pujara")

batsmanCumulativeAverageRuns("shubmanTest.csv","S Gill")

3j Cumulative Average strike rate of batsman in career
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeStrikeRate("kohliTest.csv","Kohli")

batsmanCumulativeStrikeRate("rohitTest.csv","Rohit")

batsmanCumulativeStrikeRate("rahaneTest.csv","Rahane")

batsmanCumulativeStrikeRate("pujaraTest.csv","Pujara")

batsmanCumulativeStrikeRate("shubmanTest.csv","S Gill")

3k. Future Runs forecast
Here are plots that forecast how the batsman will perform in future. In this case 90% of the career runs trend is uses as the training set. the remaining 10% is the test set.
A Holt-Winters forecating model is used to forecast future performance based on the 90% training set. The forecated runs trend is plotted. The test set is also plotted to see how close the forecast and the actual matches
Take a look at the runs forecasted for the batsman below.
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("kohli.csv","Kohli")
batsmanPerfForecast("rohit.csv","Rohit")
batsmanPerfForecast("rahane.csv","Rahane")
batsmanPerfForecast("pujara.csv","Pujara")

3l. Relative Mean Strike Rate plot
The plot below compares the Mean Strike Rate of the batsman for each of the runs ranges of 10 and plots them. The plot indicate the following
frames <- list("kohliTest.csv","rohitTest.csv","pujaraTest.csv","rahaneTest.csv","shubmanTest.csv")
names <- list("Kohli","Rohit","Pujara","Rahane","S Gill")
relativeBatsmanSR(frames,names)

3m. Relative Runs Frequency plot
The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show
frames <- list("kohliTest.csv","rohitTest.csv","pujaraTest.csv","rahaneTest.csv","shubmanTest.csv")
names <- list("Kohli","Rohit","Pujara","Rahane","S Gill")
relativeRunsFreqPerf(frames,names)

3n. Relative cumulative average runs in career
Kohli’s tops the list, followed by Pujara and Rohit is 3rd. Gill is on the upswing. Hope he performs well.
frames <- list("kohliTest.csv","rohitTest.csv","pujaraTest.csv","rahaneTest.csv","shubmanTest.csv")
names <- list("Kohli","Rohit","Pujara","Rahane","S Gill")
relativeBatsmanCumulativeAvgRuns(frames,names)

3o. Relative cumulative average strike rate in career
ROhit has the best strike rate followed by Kohli, with Shubman Gill ctaching up fast
frames <- list("kohliTest.csv","rohitTest.csv","pujaraTest.csv","rahaneTest.csv","shubmanTest.csv")
names <- list("Kohli","Rohit","Pujara","Rahane","S Gill")
relativeBatsmanCumulativeStrikeRate(frames,names)

3p. Check Batsman In-Form or Out-of-Form
The below computation uses Null Hypothesis testing and p-value to determine if the batsman is in-form or out-of-form. For this 90% of the career runs is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.
The Null Hypothesis (H0) assumes that the batsman continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the batsman is out of form the sample mean is beyond the 95% confidence interval of the population mean.
A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form
Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later
This is done for the Top 4 batsman
checkBatsmanInForm("kohli.csv","Kohli")
## [1] "**************************** Form status of Kohli ****************************\n\n Population size: 154  Mean of population: 47.03 \n Sample size: 18  Mean of sample: 32.22 SD of sample: 42.45 \n\n Null hypothesis H0 : Kohli 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Kohli 's sample average is below the 95% confidence interval of population average\n\n Kohli 's Form Status: In-Form because the p value: 0.078058  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("rohit.csv","Rohit")
## [1] "**************************** Form status of Rohit ****************************\n\n Population size: 66  Mean of population: 37.03 \n Sample size: 8  Mean of sample: 37.88 SD of sample: 35.38 \n\n Null hypothesis H0 : Rohit 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Rohit 's sample average is below the 95% confidence interval of population average\n\n Rohit 's Form Status: In-Form because the p value: 0.526254  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("rahane.csv","Rahane")
## [1] "**************************** Form status of Rahane ****************************\n\n Population size: 116  Mean of population: 34.78 \n Sample size: 13  Mean of sample: 21.38 SD of sample: 21.96 \n\n Null hypothesis H0 : Rahane 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Rahane 's sample average is below the 95% confidence interval of population average\n\n Rahane 's Form Status: Out-of-Form because the p value: 0.023244  is less than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("pujara.csv","Pujara")
## [1] "**************************** Form status of Pujara ****************************\n\n Population size: 145  Mean of population: 41.93 \n Sample size: 17  Mean of sample: 33.24 SD of sample: 31.74 \n\n Null hypothesis H0 : Pujara 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Pujara 's sample average is below the 95% confidence interval of population average\n\n Pujara 's Form Status: In-Form because the p value: 0.137319  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("shubman.csv","S Gill")
## [1] "**************************** Form status of S Gill ****************************\n\n Population size: 23  Mean of population: 30.43 \n Sample size: 3  Mean of sample: 51.33 SD of sample: 66.88 \n\n Null hypothesis H0 : S Gill 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : S Gill 's sample average is below the 95% confidence interval of population average\n\n S Gill 's Form Status: In-Form because the p value: 0.687033  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
3q. Predicting Runs given Balls Faced and Minutes at Crease
A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.
BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
kohli1 <- batsmanRunsPredict("kohli.csv","Kohli",newdataframe=newDF)
rohit1 <- batsmanRunsPredict("rohit.csv","Rohit",newdataframe=newDF)
pujara1 <- batsmanRunsPredict("pujara.csv","Pujara",newdataframe=newDF)
rahane1 <- batsmanRunsPredict("rahane.csv","Rahane",newdataframe=newDF)
sgill1 <- batsmanRunsPredict("shubman.csv","S Gill",newdataframe=newDF)
batsmen <-cbind(round(kohli1$Runs),round(rohit1$Runs),round(pujara1$Runs),round(rahane1$Runs),round(sgill1$Runs))
colnames(batsmen) <- c("Kohli","Rohit","Pujara","Rahane","S Gill")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease Kohli Rohit Pujara Rahane S Gill
## 1          10           30     6     3      3      2      7
## 2          38           71    24    19     16     17     24
## 3          66          111    41    35     29     31     40
## 4          94          152    58    51     42     45     56
## 5         121          193    76    66     55     59     73
## 6         149          234    93    82     68     74     89
## 7         177          274   110    98     80     88    106
## 8         205          315   128   114     93    102    122
## 9         233          356   145   129    106    116    139
## 10        261          396   163   145    119    130    155
## 11        289          437   180   161    132    145    171
## 12        316          478   197   177    144    159    188
## 13        344          519   215   192    157    173    204
## 14        372          559   232   208    170    187    221
## 15        400          600   249   224    183    202    237
4. Analysis of India WTC batsmen from Jan 2016 – May 2023 against Australia
4a. Relative cumulative average
Against Australia specifically between 2016 – 2023, Pujara has the best record followed by Rahane, with Gill in hot pursuit. Kohli and Rohit trail behind
frames <- list("kohliTestAus.csv","rohitTestAus.csv","pujaraTestAus.csv","rahaneTestAus.csv","shubmanTestAus.csv")
names <- list("Kohli","Rohit","Pujara","Rahane","S Gill")
relativeBatsmanCumulativeAvgRuns(frames,names)

4b. Relative cumulative average strike rate in career
In the Strike Rate department Gill tops followed by Rohit and Rahane
frames <- list("kohliTestAus.csv","rohitTestAus.csv","pujaraTestAus.csv","rahaneTestAus.csv","shubmanTestAus.csv")
names <- list("Kohli","Rohit","Pujara","Rahane","S Gill")
relativeBatsmanCumulativeStrikeRate(frames,names)

5. Analysis of Australia WTC batsmen from Jan 2016 – May 2023
5a Basic analyses
par(mfrow=c(3,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("stevesmithTest.csv","S Smith")
batsmanMeanStrikeRate("stevesmithTest.csv","S Smith")
batsmanRunsRanges("stevesmithTest.csv","S Smith")
batsmanRunsFreqPerf("warnerTest.csv","Warner")
batsmanMeanStrikeRate("warnerTest.csv","Warner")
batsmanRunsRanges("warnerTest.csv","Warner")
batsmanRunsFreqPerf("labuschagneTest.csv","M Labuschagne")
batsmanMeanStrikeRate("labuschagneTest.csv","M Labuschagne")
batsmanRunsRanges("labuschagneTest.csv","M Labuschagne")

par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
batsmanRunsFreqPerf("cgreenTest.csv","C Green")
batsmanMeanStrikeRate("cgreenTest.csv","C Green")
batsmanRunsRanges("cgreenTest.csv","C Green")
batsmanRunsFreqPerf("khwajaTest.csv","Khwaja")
batsmanMeanStrikeRate("khwajaTest.csv","Khwaja")
batsmanRunsRanges("khwajaTest.csv","Khwaja")

5b. More analyses
par(mfrow=c(3,3))
par(mar=c(4,4,2,2))
batsman4s("stevesmithTest.csv","S Smith")
batsman6s("stevesmithTest.csv","S Smith")
batsmanMeanStrikeRate("stevesmithTest.csv","S Smith")
batsman4s("warnerTest.csv","Warner")
batsman6s("warnerTest.csv","Warner")
batsmanMeanStrikeRate("warnerTest.csv","Warner")
batsman4s("labuschagneTest.csv","M Labuschagne")
batsman6s("labuschagneTest.csv","M Labuschagne")
batsmanMeanStrikeRate("labuschagneTest.csv","M Labuschagne")

par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
batsman4s("cgreenTest.csv","C Green")
batsman6s("cgreenTest.csv","C Green")
batsmanMeanStrikeRate("cgreenTest.csv","C Green")
batsman4s("khwajaTest.csv","Khwaja")
batsman6s("khwajaTest.csv","Khwaja")
batsmanMeanStrikeRate("khwajaTest.csv","Khwaja")

5c.Boxplot histogram plot
This plot shows a combined boxplot of the Runs ranges and a histog2ram of the Runs Frequency
Smith, Labuschagne has an average of 53+ since 2016!! Warner & Khwaja are at ~46
batsmanPerfBoxHist("stevesmithTest.csv","S Smith")

batsmanPerfBoxHist("warnerTest.csv","Warner")

batsmanPerfBoxHist("labuschagneTest.csv","M Labuschagne")

batsmanPerfBoxHist("cgreenTest.csv","C Green")

batsmanPerfBoxHist("khwajaTest.csv","Khwaja")

5d. Contribution to won and lost matches
For the 2 functions below you will have to use the getPlayerDataSp() function. Australia has won matches when Smith, Warner and Khwaja have played well.
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanContributionWonLost("stevesmithsp.csv","S Smith")
batsmanContributionWonLost("warnersp.csv","Warner")
batsmanContributionWonLost("labuschagnesp.csv","M Labuschagne")
batsmanContributionWonLost("cgreensp.csv","C Green")

batsmanContributionWonLost("khwajasp.csv","Khwaja")

5e. Performance at home and overseas
This function also requires the use of getPlayerDataSp() as shown above. This can only be used for Test matches
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfHomeAway("stevesmithsp.csv","S Smith")
batsmanPerfHomeAway("warnersp.csv","Warner")
batsmanPerfHomeAway("labuschagnesp.csv","M Labuschagne")
batsmanPerfHomeAway("cgreensp.csv","C Green")

batsmanPerfHomeAway("khwajasp.csv","Khwaja")

5f. Batsman average at different venues
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsGround("stevesmithTest.csv","S Smith")
batsmanAvgRunsGround("warnerTest.csv","Warner")
batsmanAvgRunsGround("labuschagneTest.csv","M Labuschagne")
batsmanAvgRunsGround("cgreenTest.csv","C Green")

batsmanAvgRunsGround("khwajaTest.csv","Khwaja")

5g. Batsman average against different opposition
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanAvgRunsOpposition("stevesmithTest.csv","S Smith")
batsmanAvgRunsOpposition("warnerTest.csv","Warner")
batsmanAvgRunsOpposition("labuschagneTest.csv","M Labuschagne")
batsmanAvgRunsOpposition("khwajaTest.csv","Khwaja")

5h. Runs Likelihood of batsman
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanRunsLikelihood("stevesmithTest.csv","S Smith")
## Summary of  S Smith 's runs scoring likelihood
## **************************************************
## 
## There is a 58.76 % likelihood that S Smith  will make  21 Runs in  38 balls over 56  Minutes 
## There is a 24.74 % likelihood that S Smith  will make  70 Runs in  148 balls over  210  Minutes 
## There is a 16.49 % likelihood that S Smith  will make  148 Runs in  268 balls over 398  Minutes
batsmanRunsLikelihood("warnerTest.csv","Warner")
## Summary of  Warner 's runs scoring likelihood
## **************************************************
## 
## There is a 7.22 % likelihood that Warner  will make  155 Runs in  253 balls over 372  Minutes 
## There is a 62.89 % likelihood that Warner  will make  14 Runs in  21 balls over  32  Minutes 
## There is a 29.9 % likelihood that Warner  will make  65 Runs in  94 balls over 135  Minutes
batsmanRunsLikelihood("labuschagneTest.csv","M Labuschagne")
## Summary of  M Labuschagne 's runs scoring likelihood
## **************************************************
## 
## There is a 32.76 % likelihood that M Labuschagne  will make  74 Runs in  144 balls over 206  Minutes 
## There is a 55.17 % likelihood that M Labuschagne  will make  22 Runs in  37 balls over  54  Minutes 
## There is a 12.07 % likelihood that M Labuschagne  will make  168 Runs in  297 balls over 420  Minutes

batsmanRunsLikelihood("khwajaTest.csv","Khwaja")
## Summary of  Khwaja 's runs scoring likelihood
## **************************************************
## 
## There is a 64.94 % likelihood that Khwaja  will make  14 Runs in  29 balls over 42  Minutes 
## There is a 27.27 % likelihood that Khwaja  will make  79 Runs in  148 balls over  210  Minutes 
## There is a 7.79 % likelihood that Khwaja  will make  165 Runs in  351 balls over 515  Minutes
5i. Moving average of batsman
Smith and Warner’s moving average has been on a downward trend lately. Khwaja is playing well
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanMovingAverage("stevesmith.csv","S Smith")
batsmanMovingAverage("warner.csv","Warner")
batsmanMovingAverage("labuschagne.csv","M Labuschagne")
batsmanMovingAverage("khwaja.csv","Khwaja")

5j. Cumulative Average runs of batsman in career
Labuschagne, SMith and Warner havwe very good cumulative average
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeAverageRuns("stevesmithTest.csv","S Smith")

batsmanCumulativeAverageRuns("warnerTest.csv","Warner")

batsmanCumulativeAverageRuns("labuschagneTest.csv","M Labuschagne")

batsmanCumulativeAverageRuns("khwajaTest.csv","Khwaja")

5k. Cumulative Average strike rate of batsman in career
Warner towers over the others in the cumulative strike rate, followed by Labuschagne and Smith
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanCumulativeStrikeRate("stevesmithTest.csv","S Smith")

batsmanCumulativeStrikeRate("warnerTest.csv","Warner")

batsmanCumulativeStrikeRate("labuschagneTest.csv","M Labuschagne")

batsmanCumulativeStrikeRate("khwajaTest.csv","Khwaja")

5l. Future Runs forecast
Here are plots that forecast how the batsman will perform in future. In this case 90% of the career runs trend is uses as the training set. the remaining 10% is the test set.
A Holt-Winters forecating model is used to forecast future performance based on the 90% training set. The forecated runs trend is plotted. The test set is also plotted to see how close the forecast and the actual matches
Take a look at the runs forecasted for the batsman below.
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
batsmanPerfForecast("stevesmithTest.csv","S Smith")
batsmanPerfForecast("warnerTest.csv","Warner")
batsmanPerfForecast("labuschagneTest.csv","M Labuschagne")
batsmanPerfForecast("khwajaTest.csv","Khwaja")

5m. Relative Mean Strike Rate plot
The plot below compares the Mean Strike Rate of the batsman for each of the runs ranges of 10 and plots them. The plot indicate the following
frames <- list("stevesmithTest.csv","warnerTest.csv","khwajaTest.csv","labuschagneTest.csv","cgreenTest.csv")
names <- list("S Smith","Warner","Khwaja","Labuschagne","C Green")
relativeBatsmanSR(frames,names)

5n. Relative Runs Frequency plot
The plot below gives the relative Runs Frequency Percetages for each 10 run bucket. The plot below show
frames <- list("stevesmithTest.csv","warnerTest.csv","khwajaTest.csv","labuschagneTest.csv","cgreenTest.csv")
names <- list("S Smith","Warner","Khwaja","Labuschagne","C Green")
relativeRunsFreqPerf(frames,names)

5o. Relative cumulative average runs in career
frames <- list("stevesmithTest.csv","warnerTest.csv","khwajaTest.csv","labuschagneTest.csv","cgreenTest.csv")
names <- list("S Smith","Warner","Khwaja","Labuschagne","C Green")
relativeBatsmanCumulativeAvgRuns(frames,names)

5p. Relative cumulative average strike rate in career
frames <- list("stevesmithTest.csv","warnerTest.csv","khwajaTest.csv","labuschagneTest.csv","cgreenTest.csv")
names <- list("S Smith","Warner","Khwaja","Labuschagne","C Green")
relativeBatsmanCumulativeStrikeRate(frames,names)

5q. Check Batsman In-Form or Out-of-Form
The below computation uses Null Hypothesis testing and p-value to determine if the batsman is in-form or out-of-form. For this 90% of the career runs is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.
The Null Hypothesis (H0) assumes that the batsman continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the batsman is out of form the sample mean is beyond the 95% confidence interval of the population mean.
A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form
Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later
This is done for the Top 4 batsman
checkBatsmanInForm("stevesmith.csv","S Smith")
## [1] "**************************** Form status of S Smith ****************************\n\n Population size: 144  Mean of population: 53.76 \n Sample size: 17  Mean of sample: 45.65 SD of sample: 56.4 \n\n Null hypothesis H0 : S Smith 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : S Smith 's sample average is below the 95% confidence interval of population average\n\n S Smith 's Form Status: In-Form because the p value: 0.280533  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("warner.csv","Warner")
## [1] "**************************** Form status of Warner ****************************\n\n Population size: 164  Mean of population: 45.2 \n Sample size: 19  Mean of sample: 26.63 SD of sample: 44.62 \n\n Null hypothesis H0 : Warner 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Warner 's sample average is below the 95% confidence interval of population average\n\n Warner 's Form Status: Out-of-Form because the p value: 0.042744  is less than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("labuschagne.csv","M Labuschagne")
## [1] "**************************** Form status of M Labuschagne ****************************\n\n Population size: 52  Mean of population: 59.56 \n Sample size: 6  Mean of sample: 29.67 SD of sample: 19.96 \n\n Null hypothesis H0 : M Labuschagne 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : M Labuschagne 's sample average is below the 95% confidence interval of population average\n\n M Labuschagne 's Form Status: Out-of-Form because the p value: 0.005239  is less than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBatsmanInForm("khwaja.csv","Khwaja")
## [1] "**************************** Form status of Khwaja ****************************\n\n Population size: 89  Mean of population: 41.62 \n Sample size: 10  Mean of sample: 53.1 SD of sample: 76.34 \n\n Null hypothesis H0 : Khwaja 's sample average is within 95% confidence interval of population average\n Alternative hypothesis Ha : Khwaja 's sample average is below the 95% confidence interval of population average\n\n Khwaja 's Form Status: In-Form because the p value: 0.677691  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
5r. Predicting Runs given Balls Faced and Minutes at Crease
A multi-variate regression plane is fitted between Runs and Balls faced +Minutes at crease.
BF <- seq( 10, 400,length=15)
Mins <- seq(30,600,length=15)
newDF <- data.frame(BF,Mins)
ssmith1 <- batsmanRunsPredict("stevesmith.csv","S Smith",newdataframe=newDF)
warner1 <- batsmanRunsPredict("warner.csv","Warner",newdataframe=newDF)
khwaja1 <- batsmanRunsPredict("khwaja.csv","Khwaja",newdataframe=newDF)
labuschagne1 <- batsmanRunsPredict("labuschagne.csv","Labuschagne",newdataframe=newDF)
cgreen1 <- batsmanRunsPredict("cgreen.csv","C Green",newdataframe=newDF)
batsmen <-cbind(round(ssmith1$Runs),round(warner1$Runs),round(khwaja1$Runs),round(labuschagne1$Runs),round(cgreen1$Runs))
colnames(batsmen) <- c("S Smith","Warner","Khwaja","Labuschagne","C Green")
newDF <- data.frame(round(newDF$BF),round(newDF$Mins))
colnames(newDF) <- c("BallsFaced","MinsAtCrease")
predictedRuns <- cbind(newDF,batsmen)
predictedRuns
##    BallsFaced MinsAtCrease S Smith Warner Khwaja Labuschagne C Green
## 1          10           30       7     10     10           9      13
## 2          38           71      23     30     24          24      29
## 3          66          111      38     50     38          40      44
## 4          94          152      53     70     53          55      60
## 5         121          193      69     90     67          70      75
## 6         149          234      84    110     81          85      91
## 7         177          274     100    130     95         100     106
## 8         205          315     115    150    109         116     122
## 9         233          356     130    170    123         131     137
## 10        261          396     146    190    137         146     153
## 11        289          437     161    210    151         161     168
## 12        316          478     177    230    165         176     184
## 13        344          519     192    250    179         192     199
## 14        372          559     207    270    193         207     215
## 15        400          600     223    290    207         222     230
6. Analysis of Australia WTC batsmen from Jan 2016 – May 2023 against India
6a. Relative cumulative average runs in career
Labuschagne, Smith and C Green have good records against India
frames <- list("stevesmithTestInd.csv","warnerTestInd.csv","khwajaTestInd.csv","labuschagneTestInd.csv","cgreenTestInd.csv")
names <- list("S Smith","Warner","Khwaja","Labuschagne","C Green")
relativeBatsmanCumulativeAvgRuns(frames,names)

6b. Relative cumulative average strike rate in career
Warner, Labuschagne and Smith have a good strike rate against India
frames <- list("stevesmithTestInd.csv","warnerTestInd.csv","khwajaTestInd.csv","labuschagneTestInd.csv","cgreenTestInd.csv")
names <- list("S Smith","Warner","Khwaja","Labuschagne","C Green")
relativeBatsmanCumulativeStrikeRate(frames,names)

7. Analysis of India WTC bowlers from Jan 2016 – May 2023
7a Wickets frequency chart
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("shamiTest.csv","Shami")
bowlerWktsFreqPercent("sirajTest.csv","Siraj")
bowlerWktsFreqPercent("ashwinTest.csv","Ashwin")
bowlerWktsFreqPercent("jadejaTest.csv","Jadeja")
bowlerWktsFreqPercent("shardulTest.csv","Shardul")

7b Wickets Runs chart
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerWktsRunsPlot("shamiTest.csv","Shami")
bowlerWktsRunsPlot("sirajTest.csv","Siraj")
bowlerWktsRunsPlot("ashwinTest.csv","Ashwin")
bowlerWktsRunsPlot("jadejaTest.csv","Jadeja")
bowlerWktsRunsPlot("shardulTest.csv","Shardul")

7c. Average wickets at different venues
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("shamiTest.csv","Shami")
bowlerAvgWktsGround("sirajTest.csv","Siraj")
bowlerAvgWktsGround("ashwinTest.csv","Ashwin")
bowlerAvgWktsGround("jadejaTest.csv","Jadeja")
bowlerAvgWktsGround("shardulTest.csv","Shardul")

7d Average wickets against different opposition
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerAvgWktsOpposition("shamiTest.csv","Shami")
bowlerAvgWktsOpposition("sirajTest.csv","Siraj")
bowlerAvgWktsOpposition("ashwinTest.csv","Ashwin")
bowlerAvgWktsOpposition("jadejaTest.csv","Jadeja")
bowlerAvgWktsOpposition("shardulTest.csv","Shardul")

7e Cumulative average wickets taken
Ashwin’s performance has dropped over the years, while Siraj has been becoming better
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerCumulativeAvgWickets("shamiTest.csv","Shami")

bowlerCumulativeAvgWickets("sirajTest.csv","Siraj")

bowlerCumulativeAvgWickets("ashwinTest.csv","Ashwin")

bowlerCumulativeAvgWickets("jadejaTest.csv","Jadeja")

bowlerCumulativeAvgWickets("shardulTest.csv","Shardul")

7g Cumulative average economy rate
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerCumulativeAvgEconRate("shamiTest.csv","Shami")

bowlerCumulativeAvgEconRate("sirajTest.csv","Siraj")

bowlerCumulativeAvgEconRate("ashwinTest.csv","Ashwin")

bowlerCumulativeAvgEconRate("jadejaTest.csv","Jadeja")

bowlerCumulativeAvgEconRate("shardulTest.csv","Shardul")

7h Wicket forecast
Here are plots that forecast how the bowler will perform in future. In this case 90% of the career wickets trend is used as the training set. the remaining 10% is the test set.
A Holt-Winters forecasting model is used to forecast future performance based on the 90% training set. The forecasted wickets trend is plotted. The test set is also plotted to see how close the forecast and the actual matches
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerPerfForecast("shamiTest.csv","Shami")
#bowlerPerfForecast("sirajTest.csv","Siraj")
bowlerPerfForecast("ashwinTest.csv","Ashwin")
bowlerPerfForecast("jadejaTest.csv","Jadeja")
bowlerPerfForecast("shardulTest.csv","Shardul")

7i Relative Wickets Frequency Percentage
frames <- list("shamiTest.csv","sirajTest.csv","ashwinTest.csv","jadejaTest.csv","shardulTest.csv")
names <- list("Shami","Siraj","Ashwin","Jadeja","Shardul")
relativeBowlingPerf(frames,names)

7j Relative Economy Rate against wickets taken
frames <- list("shamiTest.csv","sirajTest.csv","ashwinTest.csv","jadejaTest.csv","shardulTest.csv")
names <- list("Shami","Siraj","Ashwin","Jadeja","Shardul")
relativeBowlingER(frames,names)

7k Relative cumulative average wickets of bowlers in career
Ashwin has the highest wickets followed by Jadeja against all teams
frames <- list("shamiTest.csv","sirajTest.csv","ashwinTest.csv","jadejaTest.csv","shardulTest.csv")
names <- list("Shami","Siraj","Ashwin","Jadeja","Shardul")
relativeBowlerCumulativeAvgWickets(frames,names)

7l Relative cumulative average economy rate of bowlers
Jadeja has the best economy rate followed by Ashwin
frames <- list("shamiTest.csv","sirajTest.csv","ashwinTest.csv","jadejaTest.csv","shardulTest.csv")
names <- list("Shami","Siraj","Ashwin","Jadeja","Shardul")
relativeBowlerCumulativeAvgEconRate(frames,names)

7m Check for bowler in-form/out-of-form
The below computation uses Null Hypothesis testing and p-value to determine if the bowler is in-form or out-of-form. For this 90% of the career wickets is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are caculated.
The Null Hypothesis (H0) assumes that the bowler continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the bowler is out of form the sample mean is beyond the 95% confidence interval of the population mean.
A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form
Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later
Note: The check for the form status of the bowlers indicate
checkBowlerInForm("shami.csv","Shami")
## [1] "**************************** Form status of Shami ****************************\n\n Population size: 106  Mean of population: 1.93 \n Sample size: 12  Mean of sample: 1.33 SD of sample: 1.23 \n\n Null hypothesis H0 : Shami 's sample average is within 95% confidence interval \n        of population average\n Alternative hypothesis Ha : Shami 's sample average is below the 95% confidence\n        interval of population average\n\n Shami 's Form Status: In-Form because the p value: 0.058427  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBowlerInForm("siraj.csv","Siraj")
## [1] "**************************** Form status of Siraj ****************************\n\n Population size: 29  Mean of population: 1.59 \n Sample size: 4  Mean of sample: 0.25 SD of sample: 0.5 \n\n Null hypothesis H0 : Siraj 's sample average is within 95% confidence interval \n        of population average\n Alternative hypothesis Ha : Siraj 's sample average is below the 95% confidence\n        interval of population average\n\n Siraj 's Form Status: Out-of-Form because the p value: 0.002923  is less than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBowlerInForm("ashwin.csv","Ashwin")
## [1] "**************************** Form status of Ashwin ****************************\n\n Population size: 154  Mean of population: 2.77 \n Sample size: 18  Mean of sample: 2.44 SD of sample: 1.76 \n\n Null hypothesis H0 : Ashwin 's sample average is within 95% confidence interval \n        of population average\n Alternative hypothesis Ha : Ashwin 's sample average is below the 95% confidence\n        interval of population average\n\n Ashwin 's Form Status: In-Form because the p value: 0.218345  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBowlerInForm("jadeja.csv","Jadeja")
## [1] "**************************** Form status of Jadeja ****************************\n\n Population size: 108  Mean of population: 2.22 \n Sample size: 12  Mean of sample: 1.92 SD of sample: 2.35 \n\n Null hypothesis H0 : Jadeja 's sample average is within 95% confidence interval \n        of population average\n Alternative hypothesis Ha : Jadeja 's sample average is below the 95% confidence\n        interval of population average\n\n Jadeja 's Form Status: In-Form because the p value: 0.333095  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBowlerInForm("shardul.csv","Shardul")
## [1] "**************************** Form status of Shardul ****************************\n\n Population size: 13  Mean of population: 2 \n Sample size: 2  Mean of sample: 0.5 SD of sample: 0.71 \n\n Null hypothesis H0 : Shardul 's sample average is within 95% confidence interval \n        of population average\n Alternative hypothesis Ha : Shardul 's sample average is below the 95% confidence\n        interval of population average\n\n Shardul 's Form Status: Out-of-Form because the p value: 0.04807  is less than alpha=  0.05 \n *******************************************************************************************\n\n"
8. Analysis of India WTC bowlers from Jan 2016 – May 2023 against Australia
8a Relative cumulative average wickets of bowlers in career
Against Australia specifically Jadeja has the best record followed by Ashwin
frames <- list("shamiTestAus.csv","sirajTestAus.csv","ashwinTestAus.csv","jadejaTestAus.csv","shardulTestAus.csv")
names <- list("Shami","Siraj","Ashwin","Jadeja","Shardul")
relativeBowlerCumulativeAvgWickets(frames,names)

8b Relative cumulative average economy rate of bowlers
Jadeja has the best economy followed by Siraj, then Ashwin
frames <- list("shamiTestAus.csv","sirajTestAus.csv","ashwinTestAus.csv","jadejaTestAus.csv","shardulTestAus.csv")
names <- list("Shami","Siraj","Ashwin","Jadeja","Shardul")
relativeBowlerCumulativeAvgEconRate(frames,names)

8. Analysis of India WTC bowlers from Jan 2016 – May 2023
8a. Wickets frequency chart
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerWktsFreqPercent("cumminsTest.csv","Cummins")
bowlerWktsFreqPercent("starcTest.csv","Starc")
bowlerWktsFreqPercent("hazzlewoodTest.csv","Hazzlewood")
bowlerWktsFreqPercent("todd.csv","Todd")
bowlerWktsFreqPercent("lyonTest.csv","N Lyon")

8b. Wickets frequency chart
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerWktsRunsPlot("cumminsTest.csv","Cummins")
bowlerWktsRunsPlot("starcTest.csv","Starc")
bowlerWktsRunsPlot("hazzlewoodTest.csv","Hazzlewood")
bowlerWktsRunsPlot("todd.csv","Todd")
bowlerWktsRunsPlot("lyonTest.csv","N Lyon")

8c. Average wickets at different venues
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerAvgWktsGround("cumminsTest.csv","Cummins")
bowlerAvgWktsGround("starcTest.csv","Starc")
bowlerAvgWktsGround("hazzlewoodTest.csv","Hazzlewood")
bowlerAvgWktsGround("todd.csv","Todd")
bowlerAvgWktsGround("lyonTest.csv","N Lyon")

8d Average wickets against different opposition
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerAvgWktsOpposition("cumminsTest.csv","Cummins")
bowlerAvgWktsOpposition("starcTest.csv","Starc")
bowlerAvgWktsOpposition("hazzlewoodTest.csv","Hazzlewood")
bowlerAvgWktsOpposition("todd.csv","Todd")
bowlerAvgWktsOpposition("lyonTest.csv","N Lyon")

8e Cumulative average wickets taken
par(mfrow=c(2,2))
par(mar=c(4,4,2,2))
bowlerCumulativeAvgWickets("cumminsTest.csv","Cummins")

bowlerCumulativeAvgWickets("starcTest.csv","Starc")

bowlerCumulativeAvgWickets("hazzlewoodTest.csv","Hazzlewood")

bowlerCumulativeAvgWickets("todd.csv","Todd")

bowlerCumulativeAvgWickets("lyonTest.csv","N Lyon")

8g Cumulative average economy rate
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerCumulativeAvgEconRate("cumminsTest.csv","Cummins")

bowlerCumulativeAvgEconRate("starcTest.csv","Starc")

bowlerCumulativeAvgEconRate("hazzlewoodTest.csv","Hazzlewood")

bowlerCumulativeAvgEconRate("todd.csv","Todd")

bowlerCumulativeAvgEconRate("lyonTest.csv","N Lyon")

8f. Future Wickets forecast
Here are plots that forecast how the bowler will perform in future. In this case 90% of the career wickets trend is used as the training set. the remaining 10% is the test set.
A Holt-Winters forecasting model is used to forecast future performance based on the 90% training set. The forecated wickets trend is plotted. The test set is also plotted to see how close the forecast and the actual matches
par(mfrow=c(2,3))
par(mar=c(4,4,2,2))
bowlerPerfForecast("cumminsTest.csv","Cummins")
bowlerPerfForecast("starcTest.csv","Starc")
bowlerPerfForecast("hazzlewoodTest.csv","Hazzlewood")
bowlerPerfForecast("lyonTest.csv","N Lyon")

8i. Relative Wickets Frequency Percentage
frames <- list("cumminsTest.csv","starcTest.csv","hazzlewoodTest.csv","todd.csv","lyonTest.csv")
names <- list("Cummins","Starc","Hazzlewood","Todd","N Lyon")
relativeBowlingPerf(frames,names)

8j Relative Economy Rate against wickets taken
frames <- list("cumminsTest.csv","starcTest.csv","hazzlewoodTest.csv","todd.csv","lyonTest.csv")
names <- list("Cummins","Starc","Hazzlewood","Todd","N Lyon")
relativeBowlingER(frames,names)

8k Relative cumulative average wickets of bowlers in career
Cummins, Starc and Lyons are the best performers
frames <- list("cumminsTest.csv","starcTest.csv","hazzlewoodTest.csv","todd.csv","lyonTest.csv")
names <- list("Cummins","Starc","Hazzlewood","Todd","N Lyon")
relativeBowlerCumulativeAvgWickets(frames,names)

8l Relative cumulative average economy rate of bowlers
Hazzlewood, Cummins have the best economy against all oppostion
frames <- list("cumminsTest.csv","starcTest.csv","hazzlewoodTest.csv","todd.csv","lyonTest.csv")
names <- list("Cummins","Starc","Hazzlewood","Todd","N Lyon")
relativeBowlerCumulativeAvgEconRate(frames,names)

8o Check for bowler in-form/out-of-form
The below computation uses Null Hypothesis testing and p-value to determine if the bowler is in-form or out-of-form. For this 90% of the career wickets is chosen as the population and the mean computed. The last 10% is chosen to be the sample set and the sample Mean and the sample Standard Deviation are calculated.
The Null Hypothesis (H0) assumes that the bowler continues to stay in-form where the sample mean is within 95% confidence interval of population mean The Alternative (Ha) assumes that the bowler is out of form the sample mean is beyond the 95% confidence interval of the population mean.
A significance value of 0.05 is chosen and p-value us computed If p-value >= .05 – Batsman In-Form If p-value < 0.05 – Batsman Out-of-Form
Note Ideally the p-value should be done for a population that follows the Normal Distribution. But the runs population is usually left skewed. So some correction may be needed. I will revisit this later
Note: The check for the form status of the bowlers indicate
checkBowlerInForm("cummins.csv","Cummins")
## [1] "**************************** Form status of Cummins ****************************\n\n Population size: 81  Mean of population: 2.46 \n Sample size: 9  Mean of sample: 2 SD of sample: 1.5 \n\n Null hypothesis H0 : Cummins 's sample average is within 95% confidence interval \n        of population average\n Alternative hypothesis Ha : Cummins 's sample average is below the 95% confidence\n        interval of population average\n\n Cummins 's Form Status: In-Form because the p value: 0.190785  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBowlerInForm("starc.csv","Starc")
## [1] "**************************** Form status of Starc ****************************\n\n Population size: 126  Mean of population: 2.18 \n Sample size: 15  Mean of sample: 1.67 SD of sample: 1.18 \n\n Null hypothesis H0 : Starc 's sample average is within 95% confidence interval \n        of population average\n Alternative hypothesis Ha : Starc 's sample average is below the 95% confidence\n        interval of population average\n\n Starc 's Form Status: In-Form because the p value: 0.057433  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBowlerInForm("hazzlewood.csv","Hazzlewood")
## [1] "**************************** Form status of Hazzlewood ****************************\n\n Population size: 99  Mean of population: 2.04 \n Sample size: 12  Mean of sample: 1.67 SD of sample: 1.5 \n\n Null hypothesis H0 : Hazzlewood 's sample average is within 95% confidence interval \n        of population average\n Alternative hypothesis Ha : Hazzlewood 's sample average is below the 95% confidence\n        interval of population average\n\n Hazzlewood 's Form Status: In-Form because the p value: 0.204787  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
checkBowlerInForm("lyon.csv","N Lyon")
## [1] "**************************** Form status of N Lyon ****************************\n\n Population size: 193  Mean of population: 2.08 \n Sample size: 22  Mean of sample: 2.95 SD of sample: 1.96 \n\n Null hypothesis H0 : N Lyon 's sample average is within 95% confidence interval \n        of population average\n Alternative hypothesis Ha : N Lyon 's sample average is below the 95% confidence\n        interval of population average\n\n N Lyon 's Form Status: In-Form because the p value: 0.975407  is greater than alpha=  0.05 \n *******************************************************************************************\n\n"
9. Analysis of Australia WTC bowlers from Jan 2016 – May 2023 against India
9a Relative cumulative average wickets of bowlers in career
Against India Lyon, Cummins and Hazzlewood have performed well
frames <- list("cumminsTestInd.csv","starcTestInd.csv","hazzlewoodTestInd.csv","lyonTestInd.csv")
names <- list("Cummins","Starc","Hazzlewood","N Lyon")
relativeBowlerCumulativeAvgWickets(frames,names)

9b Relative cumulative average economy rate of bowlers
Hazzlewood, Lyon have a good economy rate against India
frames <- list("cumminsTestInd.csv","starcTestInd.csv","hazzlewoodTestInd.csv","lyonTestInd.csv")
names <- list("Cummins","Starc","Hazzlewood","N Lyon")
relativeBowlerCumulativeAvgEconRate(frames,names)

10 Analysis of teams – India, Australia
#The data for India & Australia teams were obtained with the following calls #indiaTest <-getTeamDataHomeAway(dir=".",teamView="bat",matchType="Test",file="indiaTest.csv",save=TRUE,teamName="India") #australiaTest <- getTeamDataHomeAway(matchType="Test",file="australiaTest.csv",save=TRUE,teamName="Australia")
10a. Win-loss of India against all oppositions in Test cricket
Against Australia India has won 17 times, lost 60 and drawn 22 in Australia. At home India won 42, tied 2, lost 28 and drawn 24
teamWinLossStatusVsOpposition("indiaTest.csv",teamName="India",opposition=c("all"),homeOrAway=c("all"),matchType="Test",plot=TRUE)

10b. Win-loss of Australia against all oppositions in Test cricket
teamWinLossStatusVsOpposition("australiaTest.csv",teamName="Australia",opposition=c("all"),homeOrAway=c("all"),matchType="Test",plot=TRUE)

10c. Win-loss of India against Australia in Test cricket
Against Australia India has won 17 times, lost 60 and drawn 22 in Australia. At home India won 42, tied 2, lost 28 and drawn 24
teamWinLossStatusVsOpposition("indiaTest.csv",teamName="India",opposition=c("Australia"),homeOrAway=c("all"),matchType="Test",plot=TRUE)

10d. Win-loss of India at all away venues
At the Oval where WTC is going to be held India has won 4, lost 10 and drawn 10.
teamWinLossStatusAtGrounds("indiaTest.csv",teamName="India",opposition=c("all"),homeOrAway=c("away"),matchType="Test",plot=TRUE)

10d. Timeline of win-loss of India against Australia in Test cricket
plotTimelineofWinsLosses("indiaTest.csv",team="India",opposition=c("Australia"),
                         homeOrAway=c("away","neutral"), startDate="2016-01-01",endDate="2023-05-01")

11. Conclusion
The above analysis performs various analysis of India and Australia in home and away matches. While we know the performance of the player at India or Australia, we cannot judge how the match will progress in the neutral, swinging conditions of the Oval. Let us hope for a good match!
Feel free to try out your own analysis with cricketr. Have fun with cricketr!!
Also see
- GooglyPlusPlus: Win Probability using Deep Learning and player embeddings
- The common alphabet of programming languages
- Practical Machine Learning with R and Python – Part 5
- Deep Learning from first principles in Python, R and Octave – Part 4
- Big Data-4: Webserver log analysis with RDDs, Pyspark, SparkR and SparklyR
- Cricpy takes guard for the Twenty20s
- Using Reinforcement Learning to solve Gridworld
- Exploring Quantum Gate operations with QCSimulator
To see all posts click Index of posts
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
