[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In [1]: import pandas as pd
In [2]: import statsmodels.api as sm
In [3]: data = pd.read_table('/home/liuwensui/Documents/data/csdata.txt')
In [4]: Y = data.LEV_LT3
In [5]: X = sm.add_constant(data[['COLLAT1', 'SIZE1', 'PROF2', 'LIQ', 'IND3A']])
In [6]: # Discrete Dependent Variable Models with Logit Link
In [7]: mod = sm.Logit(Y, X)
In [8]: res = mod.fit()
Optimization terminated successfully.
Current function value: 882.448249
Iterations 8
In [9]: print res.summary()
Logit Regression Results
==============================================================================
Dep. Variable: LEV_LT3 No. Observations: 4421
Model: Logit Df Residuals: 4415
Method: MLE Df Model: 5
Date: Sun, 16 Dec 2012 Pseudo R-squ.: 0.04022
Time: 23:40:40 Log-Likelihood: -882.45
converged: True LL-Null: -919.42
LLR p-value: 1.539e-14
==============================================================================
coef std err z P>|z| [95.0% Conf. Int.]
------------------------------------------------------------------------------
COLLAT1 1.2371 0.260 4.756 0.000 0.727 1.747
SIZE1 0.3590 0.037 9.584 0.000 0.286 0.432
PROF2 -3.1431 0.739 -4.254 0.000 -4.591 -1.695
LIQ -1.3825 0.357 -3.867 0.000 -2.083 -0.682
IND3A 0.5466 0.141 3.867 0.000 0.270 0.824
const -7.2498 0.567 -12.779 0.000 -8.362 -6.138
==============================================================================
In [10]: # Print Marginal Effects
In [11]: print pd.DataFrame(res.margeff(), index = X.columns[:(len(X.columns) - 1)], columns = ['MargEffects'])
MargEffects
COLLAT1 0.096447
SIZE1 0.027988
PROF2 -0.245035
LIQ -0.107778
IND3A 0.042611
In [12]: # Address the same type of model with R by Pyper
In [13]: import pyper as pr
In [14]: r = pr.R(use_pandas = True)
In [15]: r.r_data = data
In [16]: # Indirect Estimation of Discrete Dependent Variable Models
In [17]: r('data <- rbind(cbind(r_data, y = 1, wt = r_data$LEV_LT3), cbind(r_data, y = 0, wt = 1 - r_data$LEV_LT3))')
Out[17]: 'try({data <- rbind(cbind(r_data, y = 1, wt = r_data$LEV_LT3), cbind(r_data, y = 0, wt = 1 - r_data$LEV_LT3))})\n'
In [18]: r('mod <- glm(y ~ COLLAT1 + SIZE1 + PROF2 + LIQ + IND3A, weights = wt, subset = (wt > 0), data = data, family = binomial)')
Out[18]: 'try({mod <- glm(y ~ COLLAT1 + SIZE1 + PROF2 + LIQ + IND3A, weights = wt, subset = (wt > 0), data = data, family = binomial)})\nWarning message:\nIn eval(expr, envir, enclos) : non-integer #successes in a binomial glm!\n'
In [19]: print r('summary(mod)')
try({summary(mod)})
Call:
glm(formula = y ~ COLLAT1 + SIZE1 + PROF2 + LIQ + IND3A, family = binomial,
data = data, weights = wt, subset = (wt > 0))
Deviance Residuals:
Min 1Q Median 3Q Max
-1.0129 -0.4483 -0.3173 -0.1535 2.5379
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.24979 0.56734 -12.779 < 2e-16 ***
COLLAT1 1.23715 0.26012 4.756 1.97e-06 ***
SIZE1 0.35901 0.03746 9.584 < 2e-16 ***
PROF2 -3.14313 0.73895 -4.254 2.10e-05 ***
LIQ -1.38249 0.35749 -3.867 0.00011 ***
IND3A 0.54658 0.14136 3.867 0.00011 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2692.0 on 5536 degrees of freedom
Residual deviance: 2456.4 on 5531 degrees of freedom
AIC: 1995.4
Number of Fisher Scoring iterations: 6
To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
