Revitalizing R package yorkr
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
There is nothing so useless as doing efficiently that which should not be done at all. Peter Drucker
The most important thing in communication is to hear what isn’t being said. Peter Drucker
“Work expands to fill the time available for its completion.” Corollary: “Expenditure rises to meet income.” Parkinson’s law
Introduction
“Operation successful!!!, the Programmer Surgeon in me, thought to himself. What should have been a routine surgery, turned out to be a major operation in the end, which involved several grueling hours. The surgeon looked at the large chunks of programming logic in the operation tray, which had been surgically removed, as they had outlived their utility and had partly become dysfunctional. The surgeon glanced at the new, concise code logic which had replaced the earlier somewhat convoluted logic, with a smile of satisfaction,
To, those who tuned in late, I am referring to my R package yorkr which I had created in many years ago, in early 2016. The package had worked well for quite some time on data from Cricsheet. Cricsheet went into a hiatus in late 2017-2018, and came alive back in 2019. Unfortunately, a key function in the package, started to malfunction. The diagnosis was that the format of the YAML files had changed, in newer files, which resulted in the problem. I had got mails from users mentioning that yorkr was not converting the new YAML files. This was on my to do list for a long time, and a week or two back, I decided to “bite the bullet” and fix the issue. I hoped the fix would be trivial but it was anything but. Finally, I took the hard decision of re-designing the core of the yorkr package, which involved converting YAML files to RData (dataframes). Also, since it has been a while since I did R code, having done more of Python stuff in recent times, I had to jog my memory with my earlier 2 posts Essential R and R vs Python
I spent many hours, tweaking and fixing the new logic so that it worked on the older and new files. Finally, I am happy to say that the new code is much more compact and probably less error prone.
I also had to ensure that the converted files performed exactly on all the other yorkr functions. I ran all the my yorkr functions in my yorkr posts on ODI, Intl. T20 and IPL and made sure the results were identical. (Phew!!)
The changes will be available in CRAN in yorkr_0.0.8
Do take a look at my yorkr posts. All the functions work correctly. Do use help, as I have changed a few functions. I will have my posts reflect the correct usage, but some function or other may slip the cracks.
- One Day Internationals ODI-Part1, ODI-Part2, ODI-Part3, ODI-Part4
- International T20s – T20-Part1,T20-Part2,T20-Part3,T20-Part4
- Indian Premier League IPL-Part1, IPL-Part2,IPL-Part3, IPL-Part4
While making the changes, I also touched up some functions and made them more user friendly (added additional arguments etc). But by and large, yorkr is still yorkr and is intact.It just sports some spanking, new YAML conversion logic.
Note:
- The code is available in Github yorkr
- This RMarkdown has been published at RPubs Revitalizing yorkr
- I have already converted the YAML files for ODI, Intl T20 and IPL. You can access and download the converted data from Github at yorkrData2020
setwd("/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrgit") install.packages("yorkr_0.0.8.tar.gz",repos = NULL, type="source") library(yorkr)
Below I rank batsmen and bowlers in ODIs, T20 and IPL based on the data from Cricsheet.
1a. Rank ODI Batsmen
dir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/odi/odiMenMatches" odir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/odi/odiBattingBowlingDetails" rankODIBatsmen(dir=dir,odir=odir) ## # A tibble: 151 x 4 ## batsman matches meanRuns meanSR ## <chr> <int> <dbl> <dbl> ## 1 Babar Azam 52 50.2 87.2 ## 2 SD Hope 51 48.7 71.0 ## 3 V Kohli 207 48.4 79.4 ## 4 HM Amla 159 46.6 82.4 ## 5 DA Warner 114 46.1 88.0 ## 6 AB de Villiers 190 45.5 94.5 ## 7 JE Root 108 44.9 82.5 ## 8 SR Tendulkar 96 43.9 77.1 ## 9 IJL Trott 63 43.1 68.9 ## 10 Q de Kock 106 42.0 82.7 ## # … with 141 more rows
1b. Rank ODI Bowlers
dir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/odi/odiMenMatches" odir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/odi/odiBattingBowlingDetails" rankODIBowlers(dir=dir,odir=odir) ## # A tibble: 265 x 4 ## bowler matches totalWickets meanER ## <chr> <int> <dbl> <dbl> ## 1 SL Malinga 191 308 5.25 ## 2 MG Johnson 142 238 4.73 ## 3 Shakib Al Hasan 157 214 4.72 ## 4 Shahid Afridi 166 213 4.69 ## 5 JM Anderson 143 207 4.96 ## 6 KMDN Kulasekara 161 190 4.94 ## 7 SCJ Broad 115 189 5.31 ## 8 DW Steyn 114 188 4.96 ## 9 Mashrafe Mortaza 139 180 4.97 ## 10 Saeed Ajmal 106 180 4.17 ## # … with 255 more rows
2a. Rank T20 Batsmen
dir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/t20/t20MenMatches" odir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/t20/t20BattingBowlingDetails" rankT20Batsmen(dir=dir,odir=odir) ## # A tibble: 43 x 4 ## batsman matches meanRuns meanSR ## <chr> <int> <dbl> <dbl> ## 1 V Kohli 61 39.0 132. ## 2 Mohammad Shahzad 52 31.8 123. ## 3 CH Gayle 50 31.1 124. ## 4 BB McCullum 69 30.7 126. ## 5 PR Stirling 66 29.6 116. ## 6 MJ Guptill 70 29.6 125. ## 7 DA Warner 75 29.1 128. ## 8 AD Hales 50 28.1 120. ## 9 TM Dilshan 78 26.7 105. ## 10 RG Sharma 72 26.4 120. ## # … with 33 more rows
2b. Rank T20 Bowlers
dir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/t20/t20MenMatches" odir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/t20/t20BattingBowlingDetails" rankT20Bowlers(dir=dir,odir=odir) ## # A tibble: 153 x 4 ## bowler matches totalWickets meanER ## <chr> <int> <dbl> <dbl> ## 1 SL Malinga 78 115 7.39 ## 2 Shahid Afridi 89 98 6.80 ## 3 Saeed Ajmal 62 92 6.30 ## 4 Umar Gul 56 87 7.40 ## 5 KMDN Kulasekara 56 72 7.25 ## 6 TG Southee 55 69 8.68 ## 7 DJ Bravo 60 69 8.41 ## 8 DW Steyn 47 69 7.00 ## 9 Shakib Al Hasan 57 69 6.82 ## 10 SCJ Broad 55 68 7.83 ## # … with 143 more rows
3a. Rank IPL Batsmen
dir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/ipl/iplMatches" odir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/ipl/iplBattingBowlingDetails" rankIPLBatsmen(dir=dir,odir=odir) ## # A tibble: 69 x 4 ## batsman matches meanRuns meanSR ## <chr> <int> <dbl> <dbl> ## 1 DA Warner 130 37.9 128. ## 2 CH Gayle 125 36.2 134. ## 3 SE Marsh 67 35.9 120. ## 4 MEK Hussey 59 33.8 105. ## 5 KL Rahul 59 33.5 128. ## 6 V Kohli 175 31.6 119. ## 7 AM Rahane 116 30.7 108. ## 8 AB de Villiers 141 30.3 135. ## 9 F du Plessis 65 29.4 117. ## 10 S Dhawan 140 29.0 114. ## # … with 59 more rows
3a. Rank IPL Bowlers
dir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/ipl/iplMatches" odir="/Users/tvganesh/backup/software/cricket-package/yorkr-cricsheet/yorkrData2020/ipl/iplBattingBowlingDetails" rankIPLBowlers(dir=dir,odir=odir) ## # A tibble: 143 x 4 ## bowler matches totalWickets meanER ## <chr> <int> <dbl> <dbl> ## 1 SL Malinga 120 184 6.99 ## 2 SP Narine 108 137 6.71 ## 3 Harbhajan Singh 131 134 7.11 ## 4 DJ Bravo 85 118 8.18 ## 5 B Kumar 86 116 7.43 ## 6 YS Chahal 82 102 7.85 ## 7 R Ashwin 92 98 6.81 ## 8 JJ Bumrah 76 91 7.47 ## 9 PP Chawla 85 87 8.02 ## 10 RA Jadeja 89 85 7.93 ## # … with 133 more rows
##Conclusion
Go ahead and give yorkr a spin once yorkr_0.0.8 is available in CRAN. I hope you have fun. Do get back to me if you have any issues.
I’ll be back. Watch this space!!
You may also like
- The mechanics of Convolutional Neural Networks in Tensorflow and Keras
- Big Data-5: kNiFi-ing through cricket data with yorkpy
- Using Linear Programming (LP) for optimizing bowling change or batting lineup in T20 cricket
- Re-introducing cricketr! : An R package to analyze performances of cricketers
- Deep Learning from first principles in Python, R and Octave – Part 6
- A primer on Qubits, Quantum gates and Quantum Operations
- Practical Machine Learning with R and Python – Part 3
- Pitching yorkpy … short of good length to IPL – Part 1
To see all posts click Index of posts
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.