A Light Touch on RPy2
[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
For a statistical analyst, the first step to start a data analysis project is to import the data into the program and then to screen the descriptive statistics of the data. In python, we can easily do so with pandas package.
In [1]: import pandas as pd In [2]: data = pd.read_table("/home/liuwensui/Documents/data/csdata.txt", header = 0) In [3]: pd.set_printoptions(precision = 5) In [4]: print data.describe().to_string() LEV_LT3 TAX_NDEB COLLAT1 SIZE1 PROF2 GROWTH2 AGE LIQ IND2A IND3A IND4A IND5A count 4421.0000 4421.0000 4421.0000 4421.0000 4421.0000 4421.0000 4421.0000 4421.0000 4421.0000 4421.0000 4421.0000 4421.0000 mean 0.0908 0.8245 0.3174 13.5109 0.1446 13.6196 20.3664 0.2028 0.6116 0.1902 0.0269 0.0991 std 0.1939 2.8841 0.2272 1.6925 0.1109 36.5177 14.5390 0.2333 0.4874 0.3925 0.1619 0.2988 min 0.0000 0.0000 0.0000 7.7381 0.0000 -81.2476 6.0000 0.0000 0.0000 0.0000 0.0000 0.0000 25% 0.0000 0.3494 0.1241 12.3170 0.0721 -3.5632 11.0000 0.0348 0.0000 0.0000 0.0000 0.0000 50% 0.0000 0.5666 0.2876 13.5396 0.1203 6.1643 17.0000 0.1085 1.0000 0.0000 0.0000 0.0000 75% 0.0117 0.7891 0.4724 14.7511 0.1875 21.9516 25.0000 0.2914 1.0000 0.0000 0.0000 0.0000 max 0.9984 102.1495 0.9953 18.5866 1.5902 681.3542 210.0000 1.0002 1.0000 1.0000 1.0000 1.0000
Tonight, I’d like to add some spice to my python learning experience and do the work in a different flavor with rpy2 package, which allows me to call R functions from python.
In [5]: import rpy2.robjects as ro In [6]: rdata = ro.packages.importr('utils').read_table("/home/liuwensui/Documents/data/csdata.txt", header = True) In [7]: print ro.r.summary(rdata) LEV_LT3 TAX_NDEB COLLAT1 SIZE1 Min. :0.00000 Min. : 0.0000 Min. :0.0000 Min. : 7.738 1st Qu.:0.00000 1st Qu.: 0.3494 1st Qu.:0.1241 1st Qu.:12.317 Median :0.00000 Median : 0.5666 Median :0.2876 Median :13.540 Mean :0.09083 Mean : 0.8245 Mean :0.3174 Mean :13.511 3rd Qu.:0.01169 3rd Qu.: 0.7891 3rd Qu.:0.4724 3rd Qu.:14.751 Max. :0.99837 Max. :102.1495 Max. :0.9953 Max. :18.587 PROF2 GROWTH2 AGE LIQ Min. :0.0000158 Min. :-81.248 Min. : 6.00 Min. :0.00000 1st Qu.:0.0721233 1st Qu.: -3.563 1st Qu.: 11.00 1st Qu.:0.03483 Median :0.1203435 Median : 6.164 Median : 17.00 Median :0.10854 Mean :0.1445929 Mean : 13.620 Mean : 20.37 Mean :0.20281 3rd Qu.:0.1875148 3rd Qu.: 21.952 3rd Qu.: 25.00 3rd Qu.:0.29137 Max. :1.5902009 Max. :681.354 Max. :210.00 Max. :1.00018 IND2A IND3A IND4A IND5A Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000 Median :1.0000 Median :0.0000 Median :0.00000 Median :0.00000 Mean :0.6116 Mean :0.1902 Mean :0.02692 Mean :0.09907 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000 Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.00000
As shown above, the similar analysis can be conducted by calling R functions with python. This feature enables us to extract and process the data effectively with python without losing the graphical and statistical functionality of R.
To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.