Generating Text From An R DataFrame using PyTracery, Pandas and Reticulate
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In a couple of recent posts (Textualisation With Tracery and Database Reporting 2.0 and More Tinkering With PyTracery) I’ve started exploring various ways of using the pytracery
port of the tracery story generation tool to generate variety of texts from Python pandas data frames.
For my F1DataJunkie tinkerings I’ve been using R + SQL as the base languages, with some hardcoded Rdata2text constructions for rendering text from R dataframes (example).
Whilst there is a basic port of tracery to R, I want to make use of the various extensions I’ve been doodling with to pytracery
, so it seemed like a good opportunity to start exploring the R reticulate
package.
It was a bit of a faff trying to get things to work the first time, so here on some notes on what I had to consider to get a trivial demo working in my RStudio/Rmd/knitr environment.
Python Environment
My first attempt was to use python blocks in an Rmd document:
```{python} import sys print(sys.executable) ````
but R insisted on using the base Python path on my Mac that was not the path I wanted to use… The fix turned out to be setting the engine…
```{python, engine.path ='/Users/f1dj/anaconda3/bin/python' } import sys print(sys.executable) ````
This could also be done via a setting: opts_chunk$set(engine.path = '/Users/f1dj/anaconda3/bin/python')
One of the problems with this approach is that a Python environment is created for each chunk – so you can’t easily carry state over from one Python chunk to another.
So I had a look at a workaround using reticulate
instead.
Calling pytracery
from R using reticulate
The solution I think I’m going for is to put Python code into a file, call that into R, then pass an R dataframe as an argument to a called Python function and gett a response back into R as an R dataframe.
For example, here’s a simple python test file:
import tracery from tracery.modifiers import base_english import pandas as pd def pandas_row_mapper(row, rules, root, modifiers=base_english): ''' Function to parse single row of dataframe ''' row=row.to_dict() rules=rules.copy() for k in row: rules[k] = str(row[k]) grammar = tracery.Grammar(rules) if modifiers is not None: if isinstance(modifiers,list): for modifier in modifiers: grammar.add_modifiers(modifier) else: grammar.add_modifiers(modifiers) return grammar.flatten(root) def pandas_tracery(df, rules, root, modifiers=base_english): return df.apply(lambda row: pandas_row_mapper(row, rules, root, modifiers), axis=1) def pdt_inspect(df): return(df) def pdt_test1(df): return type(df) def pdt_demo(df): return pandas_tracery(df, _demo_rules, "#origin#", modifiers=base_english) #Create example rule to apply to each row of dataframe _demo_rules = {'origin': "#code# was placed #position#!", 'position': "#pos.uppercase#"}
We can access a python environment using reticulate
:
library(reticulate) #Show conda environments conda_list("auto") #Use a particular, name conda environment use_condaenv(condaenv='anaconda3', required=T) #Check the availability of a particular module in the environment py_module_available("tracery")
Now we can load in the python file – and the functions it defines – and then call one of the loaded Python functions.
Note that I seemed to have to force the casting of the R dataframe to a python/pandas dataframe using r_to_py()
, although I’d expected the type mapping to be handled automatically? (Perhaps there is a setting somewhere?)
```{r} source_python("pd_tracery_demo.py") df1=data.frame(code=c('Jo','Sam'), pos=c('first','Second')) df1$result = pdt_demo(r_to_py(df1, convert=T)) df1 ``` #Displays: Jo first Jo was placed FIRST! Sam Second Sam was placed SECOND!
(Note: I also spotted a gotcha – things don’t work so well if you define an R column name called name
… )
So now I can start looking at converting sports reporting tropes like these:
into tracery story models I can call using my pandas/pytracery hacks:-)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.