The Battle of Bayesian Home Run Models

[This article was first published on R – MAD-Stat, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The regular Major League Baseball season is coming to an end. Next week, we move into the playoffs and eventually the World Series. However, we have a nice statistical modeling question playing out in this last week. Giancarlo Stanton of the Miami Marlins is perhaps on a pace to match or beat Roger Maris’ record of 61 home runs in a season (unassisted by steroids and other performance enhancing drugs).

For baseball purists, this is a big deal. Maris’s record of 61 home runs stood from 1961 until the PEDs era at the turn of the century. Since the PEDs scandal broke and MLB cracked down on drug use by its players, neither Maris’ original record or Barry Bonds’ mark of 73 home runs has been approached. Giancarlo Stanton threatens to do that this year.

What makes it interesting for statisticians and R scholars is that Stanton’s push to 61 homers has incited a wave of social media posts predicting his chances of reaching or breaking 61. Two of note both use an Empirical Bayes method to arrive at an estimate. Both come from well-known groups of Bayesian statisticians: (Bob Carpenter of) Andrew Gelman’s Stan group and Jim Albert of Bowling Green University. Their methods are fairly similar, which makes the difference in their predictions interesting. The latest post from each group is located here (Stan Group) and here (Albert).

As of today, September 25, Stanton has 7 more games to play and has hit 57 home runs. 4 more and he ties the record, 5 and he breaks it. Carpenter’s model is more optimistic than Albert’s about Stanton breaking the record. His model predicts that out of 100,000 simulations, Stanton will hit 61 or more home runs 52% of the time. Albert’s model, on the other hand, has Stanton breaking the record in only 20% of the simulations.  Here is the R Notebook with my execution of the two algorithms:  Stanton_HR_Prediction.

Both models use a binomial prior modified by a beta distribution in calculating the posterior. The Carpenter model inserts another random factor in the middle–a Poisson estimation of the number of at bats that Stanton will have in his remaining games.

Baseball fans such as me are rooting for Stanton to have the success that Carpenter predicts for him. Especially those of us who remember the chase to break Babe Ruth’s 1927 record of 60 home runs. Watching Mickey Mantle and Roger Maris try for 61 that year kept us going all through August and September of that year.

Now that I work with statistics, it’s great to see leaders in the field apply the art to Stanton’s attempt to break it. I will post at least one more update this week to report on Stanton’s evolving probability, remembering every at bat changes that probability. He’s next up tonight in hitter friendly Coors Field in Colorado.

Go Giancarlo!

To leave a comment for the author, please follow the link and comment on their blog: R – MAD-Stat.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)