Another Way to Access R from Python – PypeR
[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Different from RPy2, PypeR provides another simple way to access R from Python through pipes (http://www.jstatsoft.org/v35/c02/paper). This handy feature enables data analysts to do the data munging with python and the statistical analysis with R by passing objects interactively between two computing systems.
Below is a simple demonstration on how to call R within Python through RypeR, estimate a Beta regression, and then return the model prediction from R back to Python.
In [1]: # LOAD PYTHON PACKAGES In [2]: import pandas as pd In [3]: import pyper as pr In [4]: # READ DATA In [5]: data = pd.read_table("/home/liuwensui/Documents/data/csdata.txt", header = 0) In [6]: # CREATE A R INSTANCE WITH PYPER In [7]: r = pr.R(use_pandas = True) In [8]: # PASS DATA FROM PYTHON TO R In [9]: r.assign("rdata", data) In [10]: # SHOW DATA SUMMARY In [11]: print r("summary(rdata)") try({summary(rdata)}) LEV_LT3 TAX_NDEB COLLAT1 SIZE1 Min. :0.00000 Min. : 0.0000 Min. :0.0000 Min. : 7.738 1st Qu.:0.00000 1st Qu.: 0.3494 1st Qu.:0.1241 1st Qu.:12.317 Median :0.00000 Median : 0.5666 Median :0.2876 Median :13.540 Mean :0.09083 Mean : 0.8245 Mean :0.3174 Mean :13.511 3rd Qu.:0.01169 3rd Qu.: 0.7891 3rd Qu.:0.4724 3rd Qu.:14.751 Max. :0.99837 Max. :102.1495 Max. :0.9953 Max. :18.587 PROF2 GROWTH2 AGE LIQ Min. :0.0000158 Min. :-81.248 Min. : 6.00 Min. :0.00000 1st Qu.:0.0721233 1st Qu.: -3.563 1st Qu.: 11.00 1st Qu.:0.03483 Median :0.1203435 Median : 6.164 Median : 17.00 Median :0.10854 Mean :0.1445929 Mean : 13.620 Mean : 20.37 Mean :0.20281 3rd Qu.:0.1875148 3rd Qu.: 21.952 3rd Qu.: 25.00 3rd Qu.:0.29137 Max. :1.5902009 Max. :681.354 Max. :210.00 Max. :1.00018 IND2A IND3A IND4A IND5A Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000 Median :1.0000 Median :0.0000 Median :0.00000 Median :0.00000 Mean :0.6116 Mean :0.1902 Mean :0.02692 Mean :0.09907 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000 Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.00000 In [12]: # LOAD R PACKAGE In [13]: r("library(betareg)") Out[13]: 'try({library(betareg)})\nLoading required package: Formula\n' In [14]: # ESTIMATE A BETA REGRESSION In [15]: r("m <- betareg(LEV_LT3 ~ SIZE1 + PROF2 + GROWTH2 + AGE + IND3A, data = rdata, subset = LEV_LT3 > 0)") Out[15]: 'try({m <- betareg(LEV_LT3 ~ SIZE1 + PROF2 + GROWTH2 + AGE + IND3A, data = rdata, subset = LEV_LT3 > 0)})\n' In [16]: # OUTPUT MODEL SUMMARY In [17]: print r("summary(m)") try({summary(m)}) Call: betareg(formula = LEV_LT3 ~ SIZE1 + PROF2 + GROWTH2 + AGE + IND3A, data = rdata, subset = LEV_LT3 > 0) Standardized weighted residuals 2: Min 1Q Median 3Q Max -7.2802 -0.5194 0.0777 0.6037 5.8777 Coefficients (mean model with logit link): Estimate Std. Error z value Pr(>|z|) (Intercept) 1.229773 0.312990 3.929 8.53e-05 *** SIZE1 -0.105009 0.021211 -4.951 7.39e-07 *** PROF2 -2.414794 0.377271 -6.401 1.55e-10 *** GROWTH2 0.003306 0.001043 3.169 0.00153 ** AGE -0.004999 0.001795 -2.786 0.00534 ** IND3A 0.688314 0.074069 9.293 < 2e-16 *** Phi coefficients (precision model with identity link): Estimate Std. Error z value Pr(>|z|) (phi) 3.9362 0.1528 25.77 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Type of estimator: ML (maximum likelihood) Log-likelihood: 266.7 on 7 Df Pseudo R-squared: 0.1468 Number of iterations: 25 (BFGS) + 2 (Fisher scoring) In [18]: # CALCULATE MODEL PREDICTION In [19]: r("beta_fit <- predict(m, link = 'response')") Out[19]: "try({beta_fit <- predict(m, link = 'response')})\n" In [20]: # SHOW PREDICTION SUMMARY IN R In [21]: print r("summary(beta_fit)") try({summary(beta_fit)}) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.1634 0.3069 0.3465 0.3657 0.4007 0.6695 In [22]: # PASS DATA FROM R TO PYTHON In [23]: pydata = pd.DataFrame(r.get("beta_fit"), columns = ["y_hat"]) In [24]: # SHOW PREDICTION SUMMARY IN PYTHON In [25]: pydata.y_hat.describe() Out[25]: count 1116.000000 mean 0.365675 std 0.089804 min 0.163388 25% 0.306897 50% 0.346483 75% 0.400656 max 0.669489
To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.