Maximum Likelihood Estimation and the Origin of Life

Francis Smart
9 years ago
[This article was first published on Econometrics by Simulation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Maximum likelihood Estimation (MLE) is a powerful tool in econometrics which allows for the consistent and asymptotically efficient estimation of parameters given a correct identification (in terms of distribution) of the random variable.
 
# It is also a proceedure which seems superficially quite complex and intractible, but in reality is very intuitive and easy to use.
 
# To introduct MLE, lets think about observing a repeated coin flip.
 
# Let's say we know it is not a fair coin and we would like to figure out what the likehood of heads parameter popping up is.
 
# Let's define out likelihood function this way:
 
lk <- function(responses, theta) {
  # This likelihood function returns the likelihood of (a priori) observing a 
  # particular response pattern given the likelihood of seeing a head on the
  # coin is theta.
 
  returner <- 0*theta
  # Define a vector to hold likelihoods in case theta is a vector
 
  for (i in 1:length(theta)) returner[i] <- 
    prod(theta[i]^responses) * # The likelihood of seeing the head's pattern
    prod((1-theta[i])^(1-responses)) # The likelihood of seeing the tails's pattern
 
  returner
}
 
# Let's see how our likelihood function is working.
# Let's see we flip the coin once and we observe a heads
# and our initial guess is that the coin is fair.
lk(c(1), .5)
# [1] 0.5
# Our likelihood of this being true is .5 (we know this is right)
 
# What is we get a tail?
lk(c(0), .5)
# [1] 0.5
# .5 as well.  This seems right.
 
# Now let's try something more interesting.
# Let's say we get two heads:
lk(c(1,1), .5)
# [1] 0.25
# .5^2=.25
 
# This is standard probability. But, let's now ask:
# What if the coin was not fair (as we originally guessed)?
 
# Is there a better parameter set that would lead to our observed outcome?
# Let's say, there is a 75% chance of getting heads.
# What is the likelihood of seeing our 2 heads.
lk(c(1,1), .75)
# [1] 0.5625
# So if the coin was unfair at 75% then our likelihood of observing the
# outcome we did would increase from 25% to 56%.
 
# How about is the coin were 95% in favor of heads.
lk(c(1,1), .95)
# [1] 0.9025
# Now we are up to 90%.
 
# You probably see where this is going.  
# If we assume we know nothing about the underlying distribution of the coin
# parameter, then the parameter which fits best is 1 (100%).
# Thus if we see only positive outcomes then the most likely guess at the
# underlying distribution of outcomes is that every time you flip the coin it
# will land heads.
 
# We can see this mapped out.
theta.vect <- seq(0,1,.05)
 
plot(theta.vect, lk(c(1,1), theta.vect), main="HH", 
     ylab="Probability", xlab="Theta")
 


 
# It is worth noting that though there are different probabilities
# for each unfairness parameter. We can only definitively rule out
# one option after flipping the coin twice and getting heads. 
# That is that it is impossible for the likelihood of getting a head=0.
 
# However, all other options are available. How do we choose from these options?
 
# Well... naturally, we choose the option which maximizes the probability
# of seeing the observed outcome (as the name implies).
 
# Now let's make things a little more interesting. Let's say the third 
# time we flip the coin it ends up tails.
 
# If we assume it is a fair coin then:
lk(c(1,1,0), .5)
# [1] 0.125
 
# How about our guess of .75 heads?
lk(c(1,1,0), .75)
# [1] 0.140625
# More likely than that of a fair coin but not as dramatic an improvement from
# when we saw only two heads.
 
# Let's see how the graph looks now.
plot(theta.vect, lk(c(1,1,0), theta.vect), main="HHT", 
     ylab="Probability", xlab="Theta")
 


 
# We can see that our graph is finally beginning to look a bit more interesting.
# We can see that the most likely outcome is around 65%.  For those of us
# a little ahead of the game the most likely probability is the success rate
# or 2/3 (66.6%).
 
# But the importance of the exercise to think about why 66.6% is parameter
# we select as the most likely.
lk(c(1,1,0), 2/3)
# [1] 0.1481481
# Not because it is overwhelmingly the best choice.
 
# It is only 2.3% (0.148-0.125) more likely to occur than if it were a fair coin.
# So we really are not very confident with our parameter choice at this point.
 
# However, imagine instead for one moment, if we observed the same ratio but with
# 300 coins.
cpattern <- c(rep(1,200), rep(0,100))
 
lk(cpattern, 2/3)
# [1] 1.173877e-83
 
lk(cpattern, 1/2)
# [1] 4.909093e-91
 
# Now, in terms of percentages the differences are extremely small.
# So small that it is hard to compare.  The plot can be useful:
 
plot(theta.vect, lk(cpattern, theta.vect), 
     main="(HHT)^100", ylab="Probability", xlab="Theta")
 


 
# Let's see what happens if we increase our number of coins to
# 3000
plot(theta.vect, lk(rep(cpattern,10), theta.vect), 
     main="(HHT)^1000", ylab="Probability", xlab="Theta")
 


 
# In this graph we can see the first major computational problem
# when dealing with likelihoods. They get so small, they are hard
# to manage in raw probabilities.  In this case the digits get
# rounded into 0 so that all R sees is 0.
 
# Fortunately, the maximum of a function is the same maximum 
# (in terms of parameter choices) as a monotonic transformation
# of a function.  Thus we can rescale our probabilities
# before multiplication using logs creating the log likelihood
# function which produces parameters which vary in scale much less
# dramatically.
 
# I won't say anything more about this right now except that this is why
# MLE functions always maximizes and reports the "log likelihood"
# value rather than the "likelihood".
 
# However, in this discussion it is worth noting that there is
# a somewhat useful statistic that we can produce to compare
# the likelihoods of the fair coin hypothesis with that
# of the 2/3 biased hypothesis.
 
# That is the odds ratio of the two outcomes.  How much more
# likely (multiplicatively) is our outcome to be observed
# if the coin is unfair towards 2/3 heads rather than fair?
 
lk(cpattern, 2/3)/lk(cpattern, 1/2)
# [1] 23912304
# That is to say, the outcome in which 2/3 rds of the time 
# we get heads for a coin flip of 300 coins is 23 million
# times more likely to occur if our coin
# is unfair (66.6%) over that of being fair (50%).
 
# This is a pretty big number and thus very unlikely to occur
# relative to that of a fair coin. Comparing accross all possible
# outcomes, we would find that while this ratio is not always as
# large, for example if we are comparing 2/3s to .6
lk(cpattern, 2/3)/lk(cpattern, .75)
# [1] 183.3873
# But it can still be quite large.  In this case, we are 183 times
# more likely to see the outcome we saw if we chose 2/3s as our parameter 
# choice compared with 3/4ths.
 
# Looking at the raw probability we see
lk(cpattern, 2/3)
# [1] 1.173877e-83 
# or 1 out of 8*10^82 outcomes.
 
# Thus the likelihood of a particular event ever occurring is very small, even
# given the most likely hypothesis (theta=2/3).
# However, compared nearly all other hypothesis such as (theta=1/2 or 3/4)
# the event is much more likely to have occurred.
 
# And THAT is why creationists are right to say it very unlikely
# in absolute terms that evolution brought about the origin of
# life on earth yet are also completely wrong because compared 
# with all other available hypotheses that is the only one
# which is remotely likely (at least from the series of
# outcomes that I have observed) to have occurred makes its odds
# ratio very high in my mind.
To leave a comment for the author, please follow the link and comment on their blog: Econometrics by Simulation.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.