Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
An almost-as-famous alternative to the famous Maximum Likelihood Estimation is the Method of Moments. MM has always been a favorite of mine because it often requires fewer distributional assumptions than MLE, and also because MM is much easier to explain than MLE to students and consulting clients. CRAN has a package gmm that does MM, actually the Generalized Method of Moments, and in this post I’ll explain how to use it (on the elementary level, at least).
Our data set will be bodyfat, which is included in the mfp package, with measurements on 252 men. The first column of this data frame, brozek, is the percentage of body fat, which when converted to a proportion is in (0,1). That makes it a candidate for fitting a beta distribution.
Denoting the parameters of that family by α and β, the mean and variance are α / (α + β) and α β / ((α + β)2 (α + β + 1)), respectively. MM is a model of simplicity — we match these to the sample mean and variance of our data, and then solve for α and β. Of course, the solution part may not be so simple, due to nonlinearities, but gmm worries about all that for us. Yet another tribute to the vast variety of packages available on CRAN!
In our elementary usage here, the call form is
gmm(data,momentftn,start)
where data is our data in matrix or vector form, momentftn specifies the moments, and start is our initial guess for the iterative solution process. Let’s specify momentftn:
g <- function(th,x) { t1 <- th[1] t2 <- th[2] t12 <- t1 + t2 meanb <- t1 / t12 m1 <- meanb - x m2 <- t1*t2 / (t12^2 * (t12+1)) - (x - meanb)^2 f <- cbind(m1,m2) return(f) }
This function equates population moments to sample ones, by specifying expressions that gmm() is to set to 0. The argument th here (“theta”) will be the MM estimates (at any given iteration) of the population parameters, in this case of α and β.
The function is required to specify quantities whose averages are to be set to 0. So, in the line
m1 <- meanb – x
we are saying that we want the average of the right-hand side to be 0, where x is our data. That has the effect of setting our current iteration’s estimate of α / (α + β) to our sample mean — exactly what MM is supposed to do. The line assigning to m2 then does the same thing for variance.
So, let’s try all this out on the body fat data:
> gmm(g,x,c(alpha=0.1,beta=0.1)) Method twoStep Objective function value: 2.559645e-10 alpha beta 4.6714 19.9969 Convergence code = 0 > hist(bodyfat$brozek/100,xlim=c(0,1), probability=TRUE) > curve(dbeta(x,4.67,20.00),add=TRUE)
The result looks like this (apologies for the lack of refinement in this quick graph, cutting off part of the top):
x
At least visually, it seems to be a pretty good fit.
For standard errors etc., a method for the generic function vcov() is provided:
> gmmout <- gmm(g,x,c(alpha=0.1,beta=0.1)) > vcov(gmmout) alpha beta alpha 0.2809361 0.9606354 beta 0.9606354 3.9266874
Happy GMM-ing!
Lots more posts coming, when I have time.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.