Site icon R-bloggers

Stata-like Marginal Effects for Logit and Probit Models in R

[This article was first published on TimeSeriesIreland » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Although this blog’s primary focus is time series, one feature I missed from Stata was the simple marginal effects command, ‘mfx compute’, for cross-sectional work, and I could not find an adequate replacement in R. To bridge this gap, I’ve written a (rather messy) R function to produce marginal effects readout for logit and probit models.

If we want to analyse the effect of a change in some explanatory variable, say , on the probability that a binary dependent variable is equal to 1, simple calculus will show that the probability density function evaluated at the sample mean times the estimated coefficient of will give us the marginal effect. More formally, let be the sum of coefficients times their respective explanatory variables that describe a binary dependent variable, such as employment or unemployment, etc., where . The ‘logit’ model is given by

where is the cumulative distribution function of the logistic distribution with mean zero and . Similarly, the ‘probit’ model is given by

the standard normal cumulative distribution function. Taking the derivative of either model with respect to the jth explanatory variable is a simple application of the chain rule; for the probit model:

, i.e., the standard normal probability density function, and . We can use this to calculate the marginal effects from a glm object. The R code is below; all it requires is an estimated logit or probit model from the glm function. The code is a little messy, but it should work. Feel free to email me with any suggestions (see contact tab above).

For example, using a dataset provided by Jeff Wooldridge, MROZ.dta, we can compare results from Stata and R.

R code:

require(foreign)
#
a<-read.dta("MROZ.dta",convert.factors = F)
#
logit1 <- glm(inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6, family=binomial(link="logit"), data=a)
probit1 <- glm(inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6, family=binomial(link="probit"), data=a)
#
mfx(logit1)
mfx(probit1)
#

Stata code and output:

R output:

‘mfx’ function:

mfx<-function(x){
	bb<-data.frame(x$coefficients)
	bb2<-data.frame(mean(x$data,na.rm=T))
	bbh<-as.character(rownames(bb))
	bbmeans<-bb2[bbh[2:nrow(bb)],]
	jkj <-bb[2:nrow(bb),]*bbmeans
	#
	M3<-sum(jkj) + bb[1,]
	#
	if (ll<-x$family$link=="probit"){
		probitmfx <- data.frame(mfx=dnorm(M3)*bb[2:nrow(bb),],row.names=bbh[2:nrow(bb)])
	}
		else{probitmfx <- data.frame(mfx=dlogis(M3)*bb[2:nrow(bb),],row.names=bbh[2:nrow(bb)])}
	#
	bbse<-bb/summary(x)$coef[,2]
	mfxse<-probitmfx/bbse[2:nrow(bb),]
	#
	probitmfxfull <- data.frame(mfx=probitmfx,SE=mfxse,bbmeans,summary(x)$coef[2:nrow(bb),3],summary(x)$coef[2:nrow(bb),4],row.names=bbh[2:nrow(bb)])
	colnames(probitmfxfull) <- c("mfx","SE","Mean Value","z","Pr(>|z|)")
	#
	logl <- 0.5*(-x$aic + 2*nrow(bb))
	#McFadden's R2
	depen <- x$data[,1]
	depenglm <- glm(depen ~ 1, family=binomial(link=x$family$link),data=x$data)
	logldepen <- 0.5*(-depenglm$aic + 2)
	psr2<- 1 - (logl/logldepen)
	#
	obs<-nrow(x$data)
	#SBC/BIC
	sbc<- -2*logl + log(obs)*nrow(bb)
	#HQIC
	HIC<- -2*logl + 2*log(log(obs))*nrow(bb)
	#Logit?
		if (qq<-x$family$link=="logit"){
		}
		else{}
	#CDF of Mean Model
		if (ll == TRUE){
		BigProb<-pnorm(M3)
		}
		else {BigProb<-plogis(M3)}
	#LR Test
	LRTest<- -2*(logl - logldepen)
	dfLR<-nrow(bb) - 1
	LRp<-dchisq(LRTest,df=dfLR)
	LRdata <- data.frame(LRTest, dfLR,LRp)
	colnames(LRdata) <- c("Test Statistic","DF","P-Value")
	rownames(LRdata) <- "LR Test"
	#
	tests<-rbind(BigProb,logl,psr2,x$aic,HIC,sbc)
	tests<-data.frame(tests)
	colnames(tests)<-""
	rownames(tests)<-c("CDF(Evaluated at the Mean):","Log-Likelihood:","McFadden's R2:","Akaike Information Criterion:","Hannan-Quinn Criterion:","Schwarz's Bayesian Criterion:")
	#
	cat("MFX Function for Logit and Probit", "\n")
	cat("", "\n")
		if (qq<-x$family$link=="logit"){ cat("This is a Logit Model","\n")
		}
		else if (qq<-x$family$link=="probit"){ cat("This is a Probit Model","\n")
		}
		else {cat("","\n")}
	cat("", "\n")
	cat("Reporting Marginal Effects, Evaluated at the Mean", "\n")
	cat("", "\n")
	printCoefmat(probitmfxfull, P.value=TRUE, has.Pvalue=TRUE)
	cat("", "\n")
	cat("Observations:", obs, "\n")
	cat("", "\n")
	printCoefmat(tests, P.value=F, has.Pvalue=F)
	cat("", "\n")
	cat("Likelihood-Ratio Test:", "\n")
	printCoefmat(LRdata, P.value=T, has.Pvalue=T)
	cat("", "\n")
}

To leave a comment for the author, please follow the link and comment on their blog: TimeSeriesIreland » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.