Microeconomic Theory and Linear Regression (Part 2)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
In the first part of this article I explained and compared four different functional forms to model apples’ production.
This is what I need to retake the previously explained examples:
# Libraries #install.packages(c("micEcon","lmtest","bbmle","miscTools")) library(micEcon) library(lmtest) library(stats4) #this is a base package so I don't install this library(bbmle) library(miscTools) # Load data data("appleProdFr86", package = "micEcon") data = appleProdFr86 rm(appleProdFr86) data$qCap = data$vCap/data$pCap data$qLab = data$vLab/data$pLab data$qMat = data$vMat/data$pMat # Fit models ## Linear specification prodLin = lm(qOut ~ qCap + qLab + qMat, data = data) data$qOutLin = fitted(prodLin) ## Quadratic specification prodQuad = lm(qOut ~ qCap + qLab + qMat + I(0.5*qCap^2) + I(0.5*qLab^2) + I(0.5*qMat^2) + I(qCap*qLab) + I(qCap*qMat) + I(qLab*qMat), data = data) data$qOutQuad = fitted(prodQuad) ## Cobb-Douglas specification prodCD = lm(log(qOut) ~ log(qCap) + log(qLab) + log(qMat), data = data) data$qOutCD = fitted(prodCD) ## Translog specification prodTL = lm(log(qOut) ~ log(qCap) + log(qLab) + log(qMat) + I(0.5*log(qCap)^2) + I(0.5*log(qLab)^2) + I(0.5*log(qMat)^2) + I(log(qCap)*log(qLab)) + I(log(qCap)*log(qMat)) + I(log(qLab)*log(qMat)), data = data) data$qOutTL = fitted(prodTL)
These four specifications presented different problems that suggested model misspecification. Ideally, I should find a model with positive predicted output and positive factor elasticity to make sense of the context of production I’m working with.
Model misspectification
RESET test (the name stands for Regression Equation Specification Error Test) is a test for functional form misspecification.
As a general test, RESET helps detecting ommited variables. Package lmtest
provides resettest
, a function that provides one of the many versions of the test, and it fits this model by default:
The key idea behind fitting another model that includes the output is to detect if the model ignores an important nonlinearity.
The hypothesis testing is:
If there is evidence to reject the null hypthosis, the test does not provide an alternative model and it just states that the model is misspecified.
The statistic will be a t statistic if we just include \(\gamma_1 \hat{y}_i^2\), and it will change to a F statistic if we include \(\gamma_2 \hat{y}_i^3\) or a higher power.
In the context of the worked specifications:
resettest(prodLin) RESET test data: prodLin RESET = 17.639, df1 = 2, df2 = 134, p-value = 1.584e-07 resettest(prodQuad) RESET test data: prodQuad RESET = 7.3663, df1 = 2, df2 = 128, p-value = 0.0009374 resettest(prodCD) RESET test data: prodCD RESET = 2.9224, df1 = 2, df2 = 134, p-value = 0.05724 resettest(prodTL) RESET test data: prodTL RESET = 1.2811, df1 = 2, df2 = 128, p-value = 0.2813
\(\Longrightarrow\) The null hypothesis is rejected both for Linear and Quadratic specifications while Cobb-Douglas specification is rejected at 10% of significance. The null hypothesis can’t be rejected for the Translog Specification.
I can also modify the powers in the model:
resettest(prodTL, power = 2) RESET test data: prodTL RESET = 2.358, df1 = 1, df2 = 129, p-value = 0.1271 resettest(prodTL, power = 2:4) RESET test data: prodTL RESET = 1.3475, df1 = 3, df2 = 127, p-value = 0.262
\(\Longrightarrow\) The null hypothesis can’t be rejected for Translog Specification, not even by adding a higher power.
Quasiconcavity
A function
is quasiconcave if for \(0 \leq t \leq 1\)
In particular, any concave and twice continuously differentiable function is quasiconcave.
In really raw terms, quasiconcavity guarantees there is a production plan where the company is minimizing cost. Adding differentiability, you can find an optimal production plan by using derivatives.
To cehck the details please check Advanced Microeconomic Theory by Geoffrey A. Jehle.
I’ll check the quasiconcavity in an indirect way, by checking the concavity at each observation using the hessian method:
# Quadratic Specification ## Regression coefficients b1 = coef(prodQuad)["qCap"] b2 = coef(prodQuad)["qLab"] b3 = coef(prodQuad)["qMat"] b11 = coef(prodQuad)["I(0.5 * qCap^2)"] b22 = coef(prodQuad)["I(0.5 * qLab^2)"] b33 = coef(prodQuad)["I(0.5 * qMat^2)"] b12 = b21 = coef(prodQuad)["I(qCap * qLab)"] b13 = b31 = coef(prodQuad)["I(qCap * qMat)"] b23 = b32 = coef(prodQuad)["I(qLab * qMat)"] ## Marginal productivity data$mpCapQuad = with(data, b1 + b11*qCap + b12*qLab + b13*qMat) data$mpLabQuad = with(data, b2 + b21*qCap + b22*qLab + b23*qMat) data$mpMatQuad = with(data, b3 + b31*qCap + b32*qLab + b33*qMat) ## Quasiconcavity data$quasiConcQuad = NA for(obs in 1:nrow(data)) { hmLoop = matrix(0, nrow = 3, ncol = 3) hmLoop[1,1] = b11 hmLoop[1,2] = hmLoop[2,1] = b12 hmLoop[1,3] = hmLoop[3,1] = b13 hmLoop[2,2] = b22 hmLoop[2,3] = hmLoop[3,2] = b23 hmLoop[3,3] = b33 data$quasiConcQuad[obs] = hmLoop[1,1] < 0 & det(hmLoop[1:2,1:2] > 0) & det(hmLoop < 0) } sum(!data$quasiConcQuad) [1] 140 # Translog Specification ## Regression coefficients b1 = coef(prodTL)["log(qCap)"] b2 = coef(prodTL)["log(qLab)"] b3 = coef(prodTL)["log(qMat)"] b11 = coef(prodTL)["I(0.5 * log(qCap)^2)"] b22 = coef(prodTL)["I(0.5 * log(qLab)^2)"] b33 = coef(prodTL)["I(0.5 * log(qMat)^2)"] b12 = b21 = coef(prodTL)["I(log(qCap) * log(qLab))"] b13 = b31 = coef(prodTL)["I(log(qCap) * log(qMat))"] b23 = b32 = coef(prodTL)["I(log(qLab) * log(qMat))"] ## Scale elasticity data$eCapTL = with(data, b1 + b11*log(qCap) + b12*log(qLab) + b13*log(qMat)) data$eLabTL = with(data, b2 + b21*log(qCap) + b22*log(qLab) + b23*log(qMat)) data$eMatTL = with(data, b3 + b31*log(qCap) + b32*log(qLab) + b33*log(qMat)) ## Marginal productivity data$mpCapTL = with(data, eCapTL * qOutTL / qCap) data$mpLabTL = with(data, eLabTL * qOutTL / qLab) data$mpMatTL = with(data, eMatTL * qOutTL / qMat) ## Derivatives of marginal productivity data$dmpCapCapTL = with(data, (qOutTL / qCap^2) * (b11 + eCapTL^2 - eCapTL)) data$dmpLabLabTL = with(data, (qOutTL / qLab^2) * (b22 + eLabTL^2 - eLabTL)) data$dmpMatMatTL = with(data, (qOutTL / qLab^2) * (b33 + eMatTL^2 - eMatTL)) data$dmpCapLabTL = with(data, (qOutTL / (qCap * qLab)) * (b12 + eCapTL * eLabTL)) data$dmpCapMatTL = with(data, (qOutTL / (qCap * qMat)) * (b13 + eCapTL * eMatTL)) data$dmpLabMatTL = with(data, (qOutTL / qLab * qMat) * (b23 + eLabTL * qMat)) ## Quasiconcavity data$quasiConcTL = NA for(obs in 1:nrow(data)) { hmLoop = matrix(0, nrow = 3, ncol = 3) hmLoop[1,1] = data$dmpCapCapTL[obs] hmLoop[1,2] = hmLoop[2,1] = data$dmpCapLabTL[obs] hmLoop[1,3] = hmLoop[3,1] = data$dmpCapMatTL[obs] hmLoop[2,2] = data$dmpLabLabTL[obs] hmLoop[2,3] = hmLoop[3,2] = data$dmpLabMatTL[obs] hmLoop[3,3] = data$dmpMatMatTL[obs] data$quasiConcTL[obs] = hmLoop[1,1] < 0 & det(hmLoop[1:2,1:2] > 0) & det(hmLoop < 0) } sum(!data$quasiConcTL) [1] 126
With this information I can extend the information from the last table in the first part of the post:
Quadratic | Translog | |
---|---|---|
\(R^2\) on \(y\) | 0.85 | 0.77 |
\(R^2\) on \(ln(y)\) | 0.55 | 0.63 |
Obs. with negative predicted output | 0 | 0 |
Obs. that violate the monotonicity condition | 39 | 48 |
Obs. with negative scale elasticity | 22 | 0 |
Obs. that violate the quasiconcavity condition | 140 | 126 |
\(\Longrightarrow\) Now I have more evidence in favour of a different approach as it was mentioned at the end of the first part of the post.
Non-parametric regression
A non-parametric model is of the form:
The advantages are: The functional form is not defined beforehand Obtains a best fit function \(m\) from the data * Obtains marginal effects
The disadvantages are: Higher computational cost There are no regression coefficients (the focus is the regression function) * Hard to interpret
Single variable non-parametric regression
I can predict the output by using capital as the only input:
library(KernSmooth) qCap = as.vector(data$qCap) qOut = as.vector(data$qOut) bw = dpill(qCap,qOut) fit1 = locpoly(qCap, qOut, bandwidth = bw, degree = 1) # 1st degree polynomial plot(fit1$x, fit1$y, type = "l")
Here are different specifications:
# 2nd degree polynomial fit2.1 <- locpoly(qCap, qOut, bandwidth = bw, degree = 2) fit2.2 <- locpoly(qCap, qOut, bandwidth = bw*2, degree = 2) # less variance / more bias fit2.3 <- locpoly(qCap, qOut, bandwidth = bw/2, degree = 2) # more variance / less bias # linear regression fit.lm <- lm(qOut ~ qCap) # plots plot(fit1$x, fit1$y, type = "l");par(mfrow=c(2,2)) plot(fit1$x, fit1$y, type = "l");lines(fit2.1, col = "blue") plot(fit1$x, fit1$y, type = "l");lines(fit2.2, col = "red") plot(fit1$x, fit1$y, type = "l");lines(fit2.3, col = "green") plot(fit1$x, fit1$y, type = "l");abline(fit.lm, col = "purple")
Checking the models:
# model 1 sum(is.na(fit1$y)) [1] 50 # models 2, 3 y 4 sum(is.na(fit2.1$y)) [1] 50 sum(is.na(fit2.2$y)) [1] 9 sum(is.na(fit2.3$y)) [1] 84
\(\Longrightarrow\) I should use model 2.2 and adjust it.
I'll try with a neoclassical function with diminishing marginal returns:
# se ajusta la banda y el grado fit.sqrt = locpoly(qCap, qOut, bandwidth = bw*3, degree = 0.5) sum(is.na(fit.sqrt$y)) [1] 0 sum(fit.sqrt$y < 0) [1] 0
\(\Longrightarrow\) In this model there is no indetermined or negative predicted output.
plot(fit.sqrt$x, fit.sqrt$y, type = "l")
I can obtain the median productivity:
median(fit.sqrt$y)/median(fit.sqrt$x) [1] 18.05637
\(\Longrightarrow\) In median term each capital unit increases the output in 18 units.
If I adjust the bandwidth the function will look more and more like a neoclassical function from textbooks, with an initial part of increasing marginal returns and from the point where marginal and median productivities are equal (the same point where marginal productivity is at maximum) the return becomes increasing at decreasing scale.
This is an example of the last paragraph:
fit.sqrt.2 = locpoly(qCap, qOut, bandwidth = bw*5, degree = 0.5) plot(fit.sqrt.2$x, fit.sqrt.2$y, type = "l")
Multivariable non-parametric regression
Among many alternatives, Epanechnikov's method is a non-parametric method that weights the contribution of each observation and iterating obtains the best fit function.
I'll fit a regression applying logs to the variables:
library(np) options(np.messages=FALSE) prodNP = npreg(log(qOut) ~ log(qCap) + log(qLab) + log(qMat), regtype = "ll", bwmethod = "cv.aic", ckertype = "epanechnikov", data = data, gradients = TRUE) summary(prodNP) Regression Data: 140 training points, in 3 variable(s) log(qCap) log(qLab) log(qMat) Bandwidth(s): 1.039647 332644 0.8418465 Kernel Regression Estimator: Local-Linear Bandwidth Type: Fixed Residual standard error: 0.6227669 R-squared: 0.6237078 Continuous Kernel Type: Second-Order Epanechnikov No. Continuous Explanatory Vars.: 3
These are the regression plots: the variables that do not appear in a plot are considered to be fixed to a value equal to their median.
I can test the significance for this model:
options(np.messages=FALSE) npsigtest(prodNP) Kernel Regression Significance Test Type I Test with IID Bootstrap (399 replications, Pivot = TRUE, joint = FALSE) Explanatory variables tested for significance: log(qCap) (1), log(qLab) (2), log(qMat) (3) log(qCap) log(qLab) log(qMat) Bandwidth(s): 1.039647 332644 0.8418465 Individual Significance Tests P Value: log(qCap) 0.11779 log(qLab) < 2e-16 *** log(qMat) < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\(\Longrightarrow\) at 10% of significance the only variable with non-statistical significant effects is capital.
Finally I can obtain information about Product-Factor Elasticity and Scale Elasticity:
summary(gradients(prodNP)[,1]) # capital elasticity Min. 1st Qu. Median Mean 3rd Qu. Max. -0.10300 0.08598 0.14360 0.14500 0.18810 0.30060 summary(gradients(prodNP)[,2]) # labour elasticity Min. 1st Qu. Median Mean 3rd Qu. Max. 0.02208 0.63050 0.69880 0.70970 0.80670 0.97890 summary(gradients(prodNP)[,3]) # materials elasticity Min. 1st Qu. Median Mean 3rd Qu. Max. 0.3873 0.4546 0.5669 0.5850 0.6502 1.3480 summary(rowSums(gradients(prodNP))) # scale elasticity Min. 1st Qu. Median Mean 3rd Qu. Max. 1.267 1.363 1.416 1.440 1.479 1.880 par(mfrow=c(2,2)) hist(gradients(prodNP)[,1], main = "Capital") hist(gradients(prodNP)[,2], main = "Labour") hist(gradients(prodNP)[,3], main = "Materials") hist(rowSums(gradients(prodNP)), main = "Scale")
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.