Trying to Win with R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A common competition run by vendors of fishing equipment is a ‘guess the weight and win’ where an image of someone holding a fish is posted and it is up to you to guess it’s weight with the closest guess winning a prize.
The ‘law of large numbers’ implies that the average of the guesses of many is superior to the average of the guesses of a few, so the ‘best guess’ should be close to the average of all guesses…
Motivated by the possibility of winning some fishing tackle I set about messing about with R’s regular expressions to create a tool that would enable me to make an informed guess based on the guesses of many.
The function below reads in a text file containing each persons guess (provided via a comment), extracts and cleans the guesses, transforms the guesses into a common unit (kilograms) and provides summary statistics and a histogram that would suggest the best guess you could make. Of course this function could be adapted to suit a ‘how many jelly beans in the jar?’ competition also!
Here is the output of one such competition:
Min. 1st Qu. Median Mean 3rd Qu. Max. 4.00 12.50 17.00 17.35 19.90 85.00
In this case, I would guess the weight of the fish to be around 17 kilograms!
guess_weight = function(posts){ | |
# reading in the guesses | |
guesses = readLines(posts) | |
# Match the guesses in kilograms and store result in vector | |
kgmatch = regmatches(guesses,regexpr('[1-9]+\\.?[0-9]* *(k|K) *(g|G)(s|S)?',guesses)) | |
# Changing to numeric and storing in a vector | |
kgnumerics = as.numeric(regmatches(kgmatch,regexpr('[1-9]+\\.?[0-9]*',kgmatch))) | |
# Match the guesses in pounds and store result in vector | |
lbmatch = regmatches(guesses,regexpr('[1-9]+\\.?[0-9]* *(l|L) *(b|B)(s|S)?',guesses)) | |
# Changing to numeric and storing in a vector | |
lbnumerics = as.numeric(regmatches(lbmatch,regexpr('[1-9]+\\.?[0-9]*',lbmatch))) | |
# Converting lbs to kgs | |
lbnumericsK = lbnumerics*0.453592 | |
# Combining both measures | |
totalGuesses = append(kgnumerics,lbnumericsK) | |
# calculating the mean of the guesses | |
myGuess = mean(totalGuesses) | |
# summary statistics | |
print(summary(totalGuesses)) | |
# histogram of guesses | |
hist(totalGuesses) | |
# vector of cleaned guesses | |
invisible(totalGuesses) | |
} |

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.