Stata-like Marginal Effects for Logit and Probit Models in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Although this blog’s primary focus is time series, one feature I missed from Stata was the simple marginal effects command, ‘mfx compute’, for cross-sectional work, and I could not find an adequate replacement in R. To bridge this gap, I’ve written a (rather messy) R function to produce marginal effects readout for logit and probit models.
If we want to analyse the effect of a change in some explanatory variable, say , on the probability that a binary dependent variable is equal to 1, simple calculus will show that the probability density function evaluated at the sample mean times the estimated coefficient of will give us the marginal effect. More formally, let be the sum of coefficients times their respective explanatory variables that describe a binary dependent variable, such as employment or unemployment, etc., where . The ‘logit’ model is given by
where is the cumulative distribution function of the logistic distribution with mean zero and . Similarly, the ‘probit’ model is given by
the standard normal cumulative distribution function. Taking the derivative of either model with respect to the jth explanatory variable is a simple application of the chain rule; for the probit model:
, i.e., the standard normal probability density function, and . We can use this to calculate the marginal effects from a glm object. The R code is below; all it requires is an estimated logit or probit model from the glm function. The code is a little messy, but it should work. Feel free to email me with any suggestions (see contact tab above).
For example, using a dataset provided by Jeff Wooldridge, MROZ.dta, we can compare results from Stata and R.
R code:
require(foreign) # a<-read.dta("MROZ.dta",convert.factors = F) # logit1 <- glm(inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6, family=binomial(link="logit"), data=a) probit1 <- glm(inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6, family=binomial(link="probit"), data=a) # mfx(logit1) mfx(probit1) #
Stata code and output:
R output:
‘mfx’ function:
mfx<-function(x){ bb<-data.frame(x$coefficients) bb2<-data.frame(mean(x$data,na.rm=T)) bbh<-as.character(rownames(bb)) bbmeans<-bb2[bbh[2:nrow(bb)],] jkj <-bb[2:nrow(bb),]*bbmeans # M3<-sum(jkj) + bb[1,] # if (ll<-x$family$link=="probit"){ probitmfx <- data.frame(mfx=dnorm(M3)*bb[2:nrow(bb),],row.names=bbh[2:nrow(bb)]) } else{probitmfx <- data.frame(mfx=dlogis(M3)*bb[2:nrow(bb),],row.names=bbh[2:nrow(bb)])} # bbse<-bb/summary(x)$coef[,2] mfxse<-probitmfx/bbse[2:nrow(bb),] # probitmfxfull <- data.frame(mfx=probitmfx,SE=mfxse,bbmeans,summary(x)$coef[2:nrow(bb),3],summary(x)$coef[2:nrow(bb),4],row.names=bbh[2:nrow(bb)]) colnames(probitmfxfull) <- c("mfx","SE","Mean Value","z","Pr(>|z|)") # logl <- 0.5*(-x$aic + 2*nrow(bb)) #McFadden's R2 depen <- x$data[,1] depenglm <- glm(depen ~ 1, family=binomial(link=x$family$link),data=x$data) logldepen <- 0.5*(-depenglm$aic + 2) psr2<- 1 - (logl/logldepen) # obs<-nrow(x$data) #SBC/BIC sbc<- -2*logl + log(obs)*nrow(bb) #HQIC HIC<- -2*logl + 2*log(log(obs))*nrow(bb) #Logit? if (qq<-x$family$link=="logit"){ } else{} #CDF of Mean Model if (ll == TRUE){ BigProb<-pnorm(M3) } else {BigProb<-plogis(M3)} #LR Test LRTest<- -2*(logl - logldepen) dfLR<-nrow(bb) - 1 LRp<-dchisq(LRTest,df=dfLR) LRdata <- data.frame(LRTest, dfLR,LRp) colnames(LRdata) <- c("Test Statistic","DF","P-Value") rownames(LRdata) <- "LR Test" # tests<-rbind(BigProb,logl,psr2,x$aic,HIC,sbc) tests<-data.frame(tests) colnames(tests)<-"" rownames(tests)<-c("CDF(Evaluated at the Mean):","Log-Likelihood:","McFadden's R2:","Akaike Information Criterion:","Hannan-Quinn Criterion:","Schwarz's Bayesian Criterion:") # cat("MFX Function for Logit and Probit", "\n") cat("", "\n") if (qq<-x$family$link=="logit"){ cat("This is a Logit Model","\n") } else if (qq<-x$family$link=="probit"){ cat("This is a Probit Model","\n") } else {cat("","\n")} cat("", "\n") cat("Reporting Marginal Effects, Evaluated at the Mean", "\n") cat("", "\n") printCoefmat(probitmfxfull, P.value=TRUE, has.Pvalue=TRUE) cat("", "\n") cat("Observations:", obs, "\n") cat("", "\n") printCoefmat(tests, P.value=F, has.Pvalue=F) cat("", "\n") cat("Likelihood-Ratio Test:", "\n") printCoefmat(LRdata, P.value=T, has.Pvalue=T) cat("", "\n") }
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.