Run R Code Within Python On The Fly
[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Below is an example showing how to run R code within python, which is an extremely attractive feature for hardcore R programmers.
In [1]: import rpy2.robjects as ro
In [2]: _null_ = ro.r('data <- read.table("/home/liuwensui/data/credit_count.txt", header = TRUE, sep = ",")')
In [3]: print ro.r('str(data)')
'data.frame': 13444 obs. of 14 variables:
$ CARDHLDR: int 0 0 1 1 1 1 1 1 1 1 ...
$ DEFAULT : int 0 0 0 0 0 0 0 0 0 0 ...
$ AGE : num 27.2 40.8 37.7 42.5 21.3 ...
$ ACADMOS : int 4 111 54 60 8 78 25 6 20 162 ...
$ ADEPCNT : int 0 3 3 3 0 1 1 0 3 7 ...
$ MAJORDRG: int 0 0 0 0 0 0 0 0 0 0 ...
$ MINORDRG: int 0 0 0 0 0 0 0 0 0 0 ...
$ OWNRENT : int 0 1 1 1 0 0 1 0 0 1 ...
$ INCOME : num 1200 4000 3667 2000 2917 ...
$ SELFEMPL: int 0 0 0 0 0 0 0 0 0 0 ...
$ INCPER : num 18000 13500 11300 17250 35000 ...
$ EXP_INC : num 0.000667 0.000222 0.03327 0.048427 0.016523 ...
$ SPENDING: num NA NA 122 96.9 48.2 ...
$ LOGSPEND: num NA NA 4.8 4.57 3.88 ...
NULL
In [4]: _null_ = ro.r('sample <- data[data$CARDHLDR == 1,]')
In [5]: print ro.r('summary(sample)')
CARDHLDR DEFAULT AGE ACADMOS ADEPCNT
Min. :1 Min. :0.00000 Min. : 0.00 Min. : 0.0 Min. :0.0000
1st Qu.:1 1st Qu.:0.00000 1st Qu.:25.75 1st Qu.: 12.0 1st Qu.:0.0000
Median :1 Median :0.00000 Median :31.67 Median : 30.0 Median :0.0000
Mean :1 Mean :0.09487 Mean :33.67 Mean : 55.9 Mean :0.9904
3rd Qu.:1 3rd Qu.:0.00000 3rd Qu.:39.75 3rd Qu.: 72.0 3rd Qu.:2.0000
Max. :1 Max. :1.00000 Max. :88.67 Max. :564.0 Max. :9.0000
MAJORDRG MINORDRG OWNRENT INCOME
Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. : 50
1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1750
Median :0.0000 Median :0.0000 Median :0.0000 Median :2292
Mean :0.1433 Mean :0.2207 Mean :0.4791 Mean :2606
3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:3042
Max. :6.0000 Max. :7.0000 Max. :1.0000 Max. :8333
SELFEMPL INCPER EXP_INC SPENDING
Min. :0.00000 Min. : 700 Min. :0.000096 Min. : 0.111
1st Qu.:0.00000 1st Qu.: 12900 1st Qu.:0.025998 1st Qu.: 58.753
Median :0.00000 Median : 20000 Median :0.058957 Median : 139.992
Mean :0.05362 Mean : 22581 Mean :0.090744 Mean : 226.983
3rd Qu.:0.00000 3rd Qu.: 28337 3rd Qu.:0.116123 3rd Qu.: 284.440
Max. :1.00000 Max. :150000 Max. :2.037728 Max. :4810.309
LOGSPEND
Min. :-2.197
1st Qu.: 4.073
Median : 4.942
Mean : 4.729
3rd Qu.: 5.651
Max. : 8.479
In [6]: print ro.r('summary(glm(DEFAULT ~ MAJORDRG + MINORDRG + OWNRENT + INCOME, data = sample, family = binomial))')
Call:
glm(formula = DEFAULT ~ MAJORDRG + MINORDRG + OWNRENT + INCOME,
family = binomial, data = sample)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.9587 -0.5003 -0.4351 -0.3305 3.1928
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.204e+00 9.084e-02 -13.259 < 2e-16 ***
MAJORDRG 2.031e-01 6.926e-02 2.933 0.00336 **
MINORDRG 2.027e-01 4.798e-02 4.225 2.38e-05 ***
OWNRENT -2.012e-01 7.163e-02 -2.809 0.00496 **
INCOME -4.422e-04 4.044e-05 -10.937 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6586.1 on 10498 degrees of freedom
Residual deviance: 6376.2 on 10494 degrees of freedom
AIC: 6386.2
Number of Fisher Scoring iterations: 6
To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.