Run R Code Within Python On The Fly
[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Below is an example showing how to run R code within python, which is an extremely attractive feature for hardcore R programmers.
In [1]: import rpy2.robjects as ro In [2]: _null_ = ro.r('data <- read.table("/home/liuwensui/data/credit_count.txt", header = TRUE, sep = ",")') In [3]: print ro.r('str(data)') 'data.frame': 13444 obs. of 14 variables: $ CARDHLDR: int 0 0 1 1 1 1 1 1 1 1 ... $ DEFAULT : int 0 0 0 0 0 0 0 0 0 0 ... $ AGE : num 27.2 40.8 37.7 42.5 21.3 ... $ ACADMOS : int 4 111 54 60 8 78 25 6 20 162 ... $ ADEPCNT : int 0 3 3 3 0 1 1 0 3 7 ... $ MAJORDRG: int 0 0 0 0 0 0 0 0 0 0 ... $ MINORDRG: int 0 0 0 0 0 0 0 0 0 0 ... $ OWNRENT : int 0 1 1 1 0 0 1 0 0 1 ... $ INCOME : num 1200 4000 3667 2000 2917 ... $ SELFEMPL: int 0 0 0 0 0 0 0 0 0 0 ... $ INCPER : num 18000 13500 11300 17250 35000 ... $ EXP_INC : num 0.000667 0.000222 0.03327 0.048427 0.016523 ... $ SPENDING: num NA NA 122 96.9 48.2 ... $ LOGSPEND: num NA NA 4.8 4.57 3.88 ... NULL In [4]: _null_ = ro.r('sample <- data[data$CARDHLDR == 1,]') In [5]: print ro.r('summary(sample)') CARDHLDR DEFAULT AGE ACADMOS ADEPCNT Min. :1 Min. :0.00000 Min. : 0.00 Min. : 0.0 Min. :0.0000 1st Qu.:1 1st Qu.:0.00000 1st Qu.:25.75 1st Qu.: 12.0 1st Qu.:0.0000 Median :1 Median :0.00000 Median :31.67 Median : 30.0 Median :0.0000 Mean :1 Mean :0.09487 Mean :33.67 Mean : 55.9 Mean :0.9904 3rd Qu.:1 3rd Qu.:0.00000 3rd Qu.:39.75 3rd Qu.: 72.0 3rd Qu.:2.0000 Max. :1 Max. :1.00000 Max. :88.67 Max. :564.0 Max. :9.0000 MAJORDRG MINORDRG OWNRENT INCOME Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. : 50 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:1750 Median :0.0000 Median :0.0000 Median :0.0000 Median :2292 Mean :0.1433 Mean :0.2207 Mean :0.4791 Mean :2606 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:3042 Max. :6.0000 Max. :7.0000 Max. :1.0000 Max. :8333 SELFEMPL INCPER EXP_INC SPENDING Min. :0.00000 Min. : 700 Min. :0.000096 Min. : 0.111 1st Qu.:0.00000 1st Qu.: 12900 1st Qu.:0.025998 1st Qu.: 58.753 Median :0.00000 Median : 20000 Median :0.058957 Median : 139.992 Mean :0.05362 Mean : 22581 Mean :0.090744 Mean : 226.983 3rd Qu.:0.00000 3rd Qu.: 28337 3rd Qu.:0.116123 3rd Qu.: 284.440 Max. :1.00000 Max. :150000 Max. :2.037728 Max. :4810.309 LOGSPEND Min. :-2.197 1st Qu.: 4.073 Median : 4.942 Mean : 4.729 3rd Qu.: 5.651 Max. : 8.479 In [6]: print ro.r('summary(glm(DEFAULT ~ MAJORDRG + MINORDRG + OWNRENT + INCOME, data = sample, family = binomial))') Call: glm(formula = DEFAULT ~ MAJORDRG + MINORDRG + OWNRENT + INCOME, family = binomial, data = sample) Deviance Residuals: Min 1Q Median 3Q Max -0.9587 -0.5003 -0.4351 -0.3305 3.1928 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -1.204e+00 9.084e-02 -13.259 < 2e-16 *** MAJORDRG 2.031e-01 6.926e-02 2.933 0.00336 ** MINORDRG 2.027e-01 4.798e-02 4.225 2.38e-05 *** OWNRENT -2.012e-01 7.163e-02 -2.809 0.00496 ** INCOME -4.422e-04 4.044e-05 -10.937 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 6586.1 on 10498 degrees of freedom Residual deviance: 6376.2 on 10494 degrees of freedom AIC: 6386.2 Number of Fisher Scoring iterations: 6
To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.