Site icon R-bloggers

Separating Code from Presentation in Jupyter Notebooks

[This article was first published on R – Win Vector LLC, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One of the great conveniences of performing a data science style analysis using Jupyter is that Jupyter notebooks are literate containers that combine code, text, results, and graphs. This is also one of the pain points in working with Jupyter notebooks with partners or with source control. That is: Jupyter notebooks are JSON (which rapidly becomes not human readable, and not easily diff-able) and many notebook viewing tools alter the notebook even on opening.

There are tools for dealing with this, such as git hooks that strip output data- but they have not met my needs in the past.

The above differs from the knitr/rmarkdown strategy pursued by R Studio. In that scheme “.Rmd” files are purely code and text (user produced inputs), and are processed to produce outputs (typically markdown, HTML, pdf, and others).

As I switch back and forth between R and Python projects for various clients and partners, I got to thinking: “is there an easy way to separate code from presentations in Jupyter notebooks?”

The answer turns is yes. Jupyter itself exposes a rich application programming interface in Python. So it is very easy to organize Jupyter’s power into tools that give me a great data science and analysis workflow in Python.

All of the steps I am going to demonstrate can be found here.

What we do is start with an ‘.ipynb’ Jupyter worksheet or notebook: plot.ipynb. I can edit and execute this worksheet using JupyterLab, Visual Studio Code, PyCharm, or many other interactive tools. As usual the sheets input cells are a mixture of text cells and markdown cells.

Obviously Jupyter itself can export the notebook to Python:

    jupyter nbconvert --to script --stdout plot.ipynb > plot_nbconvert.py

However this is pretty much one way, there isn’t a quick way to convert plot_nbconvert.py back to an ‘.ipynb’ notebook or to execute the Python in such a way that we also get the implicit printing and plotting that notebooks provide for the last value seen in each cell and the markdown formatting. The converted “.py” file doesn’t preserve enough of our expressed intent.

Suppose, instead, we export our notebook with the following command (supplied by the wvpy package):

    python -m wvpy.pysheet plot.ipynb

This creates the file plot.py. This export uses the convention that free text is taken to be Python code, and markdown is in special quote-blocks. When there are neighboring code blocks, there is an annotation to find the boundaries so we don’t lose the block structure.

Once we have this file we can do one of two things:

One can share, edit, and diff the ‘.py’ file. All one has to do is mark markdown in line-initial “''' begin text” and “''' # end text” blocks. Multiple code blocks are separate by “'''end code'''” lines.

The design is: work however you want (definitely prototyping in ‘.ipynb’ files, using whatever tools you like), but save only converted ‘.py’ files in source control. Automate re-running sheets (even multiple runs taking external parameters) to reproduce results at will.

This is a workflow we intend to use and teach a lot.

To leave a comment for the author, please follow the link and comment on their blog: R – Win Vector LLC.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.