Site icon R-bloggers

Generating Text From An R DataFrame using PyTracery, Pandas and Reticulate

[This article was first published on Rstats – OUseful.Info, the blog…, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In a couple of recent posts (Textualisation With Tracery and Database Reporting 2.0 and More Tinkering With PyTracery) I’ve started exploring various ways of using the pytracery port of the tracery story generation tool to generate variety of texts from Python pandas data frames.

For my F1DataJunkie tinkerings I’ve been using R + SQL as the base languages, with some hardcoded Rdata2text constructions for rendering text from R dataframes (example).

Whilst there is a basic port of tracery to R, I want to make use of the various extensions I’ve been doodling with to pytracery, so it seemed like a good opportunity to start exploring the R reticulate package.

It was a bit of a faff trying to get things to work the first time, so here on some notes on what I had to consider to get a trivial demo working in my RStudio/Rmd/knitr environment.

Python Environment

My first attempt was to use python blocks in an Rmd document:

```{python}
import sys
print(sys.executable)
````

but R insisted on using the base Python path on my Mac that was not the path I wanted to use… The fix turned out to be setting the engine…

```{python, engine.path ='/Users/f1dj/anaconda3/bin/python' }
import sys
print(sys.executable)
````

This could also be done via a setting: opts_chunk$set(engine.path = '/Users/f1dj/anaconda3/bin/python')

One of the problems with this approach is that a Python environment is created for each chunk – so you can’t easily carry state over from one Python chunk to another.

So I had a look at a workaround using reticulate instead.

Calling pytracery from R using reticulate

The solution I think I’m going for is to put Python code into a file, call that into R, then pass an R dataframe as an argument to a called Python function and gett a response back into R as an R dataframe.

For example, here’s a simple python test file:

import tracery
from tracery.modifiers import base_english
import pandas as pd

def pandas_row_mapper(row, rules, root,  modifiers=base_english):
    ''' Function to parse single row of dataframe '''
    row=row.to_dict()
    rules=rules.copy()

    for k in row:
        rules[k] = str(row[k])
        grammar = tracery.Grammar(rules)
        if modifiers is not None:
            if isinstance(modifiers,list):
                for modifier in modifiers:
                    grammar.add_modifiers(modifier)
            else:
                grammar.add_modifiers(modifiers)

    return grammar.flatten(root)

def pandas_tracery(df, rules, root, modifiers=base_english):
  return df.apply(lambda row: pandas_row_mapper(row, rules, root, modifiers), axis=1)

def pdt_inspect(df):
  return(df)

def pdt_test1(df):
  return type(df)

def pdt_demo(df):
  return pandas_tracery(df, _demo_rules, "#origin#",  modifiers=base_english)

#Create example rule to apply to each row of dataframe
_demo_rules = {'origin': "#code# was placed #position#!",
         'position': "#pos.uppercase#"}

We can access a python environment using reticulate:

library(reticulate)

#Show conda environments
conda_list("auto")

#Use a particular, name conda environment
use_condaenv(condaenv='anaconda3', required=T)

#Check the availability of a particular module in the environment
py_module_available("tracery")

Now we can load in the python file – and the functions it defines – and then call one of the  loaded Python functions.

Note that I seemed to have to force the casting of the R dataframe to a python/pandas dataframe using r_to_py(), although I’d expected the type mapping to be handled automatically? (Perhaps there is a setting somewhere?)

```{r}
source_python("pd_tracery_demo.py")
df1=data.frame(code=c('Jo','Sam'), pos=c('first','Second'))
df1$result = pdt_demo(r_to_py(df1, convert=T))
df1
```

#Displays:
Jo	first	Jo was placed FIRST!
Sam	Second	Sam was placed SECOND!

(Note: I also spotted a gotcha – things don’t work so well if you define an R column name called name… )

So now I can start looking at converting sports reporting tropes like these:

into tracery story models I can call using my pandas/pytracery hacks:-)

To leave a comment for the author, please follow the link and comment on their blog: Rstats – OUseful.Info, the blog….

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.