Fractional Logit Model with Python
[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In [1]: import pandas as pd In [2]: import statsmodels.api as sm In [3]: data = pd.read_table('/home/liuwensui/Documents/data/csdata.txt') In [4]: Y = data.LEV_LT3 In [5]: X = sm.add_constant(data[['COLLAT1', 'SIZE1', 'PROF2', 'LIQ', 'IND3A']]) In [6]: # Discrete Dependent Variable Models with Logit Link In [7]: mod = sm.Logit(Y, X) In [8]: res = mod.fit() Optimization terminated successfully. Current function value: 882.448249 Iterations 8 In [9]: print res.summary() Logit Regression Results ============================================================================== Dep. Variable: LEV_LT3 No. Observations: 4421 Model: Logit Df Residuals: 4415 Method: MLE Df Model: 5 Date: Sun, 16 Dec 2012 Pseudo R-squ.: 0.04022 Time: 23:40:40 Log-Likelihood: -882.45 converged: True LL-Null: -919.42 LLR p-value: 1.539e-14 ============================================================================== coef std err z P>|z| [95.0% Conf. Int.] ------------------------------------------------------------------------------ COLLAT1 1.2371 0.260 4.756 0.000 0.727 1.747 SIZE1 0.3590 0.037 9.584 0.000 0.286 0.432 PROF2 -3.1431 0.739 -4.254 0.000 -4.591 -1.695 LIQ -1.3825 0.357 -3.867 0.000 -2.083 -0.682 IND3A 0.5466 0.141 3.867 0.000 0.270 0.824 const -7.2498 0.567 -12.779 0.000 -8.362 -6.138 ============================================================================== In [10]: # Print Marginal Effects In [11]: print pd.DataFrame(res.margeff(), index = X.columns[:(len(X.columns) - 1)], columns = ['MargEffects']) MargEffects COLLAT1 0.096447 SIZE1 0.027988 PROF2 -0.245035 LIQ -0.107778 IND3A 0.042611 In [12]: # Address the same type of model with R by Pyper In [13]: import pyper as pr In [14]: r = pr.R(use_pandas = True) In [15]: r.r_data = data In [16]: # Indirect Estimation of Discrete Dependent Variable Models In [17]: r('data <- rbind(cbind(r_data, y = 1, wt = r_data$LEV_LT3), cbind(r_data, y = 0, wt = 1 - r_data$LEV_LT3))') Out[17]: 'try({data <- rbind(cbind(r_data, y = 1, wt = r_data$LEV_LT3), cbind(r_data, y = 0, wt = 1 - r_data$LEV_LT3))})\n' In [18]: r('mod <- glm(y ~ COLLAT1 + SIZE1 + PROF2 + LIQ + IND3A, weights = wt, subset = (wt > 0), data = data, family = binomial)') Out[18]: 'try({mod <- glm(y ~ COLLAT1 + SIZE1 + PROF2 + LIQ + IND3A, weights = wt, subset = (wt > 0), data = data, family = binomial)})\nWarning message:\nIn eval(expr, envir, enclos) : non-integer #successes in a binomial glm!\n' In [19]: print r('summary(mod)') try({summary(mod)}) Call: glm(formula = y ~ COLLAT1 + SIZE1 + PROF2 + LIQ + IND3A, family = binomial, data = data, weights = wt, subset = (wt > 0)) Deviance Residuals: Min 1Q Median 3Q Max -1.0129 -0.4483 -0.3173 -0.1535 2.5379 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -7.24979 0.56734 -12.779 < 2e-16 *** COLLAT1 1.23715 0.26012 4.756 1.97e-06 *** SIZE1 0.35901 0.03746 9.584 < 2e-16 *** PROF2 -3.14313 0.73895 -4.254 2.10e-05 *** LIQ -1.38249 0.35749 -3.867 0.00011 *** IND3A 0.54658 0.14136 3.867 0.00011 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 2692.0 on 5536 degrees of freedom Residual deviance: 2456.4 on 5531 degrees of freedom AIC: 1995.4 Number of Fisher Scoring iterations: 6
To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.