[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Rmagic (http://ipython.org/ipython-doc/dev/config/extensions/rmagic.html) is the ipython extension that utilizes rpy2 in the back-end and provides a convenient interface accessing R from ipython. Compared with the generic use of rpy2, the rmagic extension allows users to exchange objects between ipython and R in a more flexible way and to run a single R function or a block of R code conveniently.
Below is an example demonstrating a simple use case how to push a pandas DataFrame object into R, convert it to a R data.frame, and then transfer back to a new pandas DataFrame object again.
In [1]: import pandas as pd In [2]: # READ DATA INTO PANDAS DATAFRAME In [3]: pydf1 = pd.read_table('../data/csdata.txt', header = 0) In [4]: print pydf1.describe() LEV_LT3 TAX_NDEB COLLAT1 SIZE1 PROF2 \ count 4421.000000 4421.000000 4421.000000 4421.000000 4421.000000 mean 0.090832 0.824537 0.317354 13.510870 0.144593 std 0.193872 2.884129 0.227150 1.692520 0.110908 min 0.000000 0.000000 0.000000 7.738052 0.000016 25% 0.000000 0.349381 0.124094 12.316970 0.072123 50% 0.000000 0.566577 0.287613 13.539574 0.120344 75% 0.011689 0.789128 0.472355 14.751119 0.187515 max 0.998372 102.149483 0.995346 18.586632 1.590201 GROWTH2 AGE LIQ IND2A IND3A \ count 4421.000000 4421.000000 4421.000000 4421.000000 4421.000000 mean 13.619633 20.366433 0.202813 0.611626 0.190228 std 36.517739 14.538997 0.233256 0.487435 0.392526 min -81.247627 6.000000 0.000000 0.000000 0.000000 25% -3.563235 11.000000 0.034834 0.000000 0.000000 50% 6.164303 17.000000 0.108544 1.000000 0.000000 75% 21.951632 25.000000 0.291366 1.000000 0.000000 max 681.354187 210.000000 1.000182 1.000000 1.000000 IND4A IND5A count 4421.000000 4421.000000 mean 0.026917 0.099073 std 0.161859 0.298793 min 0.000000 0.000000 25% 0.000000 0.000000 50% 0.000000 0.000000 75% 0.000000 0.000000 max 1.000000 1.000000 In [5]: # CONVERT PANDAS DATAFRAME TO R DATA.FRAME In [6]: %load_ext rmagic In [7]: col = pydf1.columns In [8]: %R -i pydf1,col colnames(pydf1) <- unlist(col); print(is.matrix(pydf1)) [1] TRUE In [9]: %R rdf <- data.frame(pydf1); print(is.data.frame(rdf)) [1] TRUE In [10]: %R print(summary(rdf)) LEV_LT3 TAX_NDEB COLLAT1 SIZE1 Min. :0.00000 Min. : 0.0000 Min. :0.0000 Min. : 7.738 1st Qu.:0.00000 1st Qu.: 0.3494 1st Qu.:0.1241 1st Qu.:12.317 Median :0.00000 Median : 0.5666 Median :0.2876 Median :13.540 Mean :0.09083 Mean : 0.8245 Mean :0.3174 Mean :13.511 3rd Qu.:0.01169 3rd Qu.: 0.7891 3rd Qu.:0.4724 3rd Qu.:14.751 Max. :0.99837 Max. :102.1495 Max. :0.9953 Max. :18.587 PROF2 GROWTH2 AGE LIQ Min. :0.0000158 Min. :-81.248 Min. : 6.00 Min. :0.00000 1st Qu.:0.0721233 1st Qu.: -3.563 1st Qu.: 11.00 1st Qu.:0.03483 Median :0.1203435 Median : 6.164 Median : 17.00 Median :0.10854 Mean :0.1445929 Mean : 13.620 Mean : 20.37 Mean :0.20281 3rd Qu.:0.1875148 3rd Qu.: 21.952 3rd Qu.: 25.00 3rd Qu.:0.29137 Max. :1.5902009 Max. :681.354 Max. :210.00 Max. :1.00018 IND2A IND3A IND4A IND5A Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.00000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.00000 1st Qu.:0.00000 Median :1.0000 Median :0.0000 Median :0.00000 Median :0.00000 Mean :0.6116 Mean :0.1902 Mean :0.02692 Mean :0.09907 3rd Qu.:1.0000 3rd Qu.:0.0000 3rd Qu.:0.00000 3rd Qu.:0.00000 Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.00000 In [11]: # CONVER R DATA.FRAME BACK TO PANDAS DATAFRAME In [12]: %R -d rdf In [13]: pydf2 = pd.DataFrame(rdf) In [14]: print pydf2.describe() LEV_LT3 TAX_NDEB COLLAT1 SIZE1 PROF2 \ count 4421.000000 4421.000000 4421.000000 4421.000000 4421.000000 mean 0.090832 0.824537 0.317354 13.510870 0.144593 std 0.193872 2.884129 0.227150 1.692520 0.110908 min 0.000000 0.000000 0.000000 7.738052 0.000016 25% 0.000000 0.349381 0.124094 12.316970 0.072123 50% 0.000000 0.566577 0.287613 13.539574 0.120344 75% 0.011689 0.789128 0.472355 14.751119 0.187515 max 0.998372 102.149483 0.995346 18.586632 1.590201 GROWTH2 AGE LIQ IND2A IND3A \ count 4421.000000 4421.000000 4421.000000 4421.000000 4421.000000 mean 13.619633 20.366433 0.202813 0.611626 0.190228 std 36.517739 14.538997 0.233256 0.487435 0.392526 min -81.247627 6.000000 0.000000 0.000000 0.000000 25% -3.563235 11.000000 0.034834 0.000000 0.000000 50% 6.164303 17.000000 0.108544 1.000000 0.000000 75% 21.951632 25.000000 0.291366 1.000000 0.000000 max 681.354187 210.000000 1.000182 1.000000 1.000000 IND4A IND5A count 4421.000000 4421.000000 mean 0.026917 0.099073 std 0.161859 0.298793 min 0.000000 0.000000 25% 0.000000 0.000000 50% 0.000000 0.000000 75% 0.000000 0.000000 max 1.000000 1.000000
To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.