Site icon R-bloggers

Jupyter And R Markdown: Notebooks With R

[This article was first published on DataCamp Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When working on data science problems, you might want to set up an interactive environment to work and share your code for a project with others. You can easily set this up with a notebook. 

In other cases, you’ll just want to communicate about the workflow and the results that you have gathered for the analysis of your data science problem. For a transparent and reproducible report, a notebook can also come in handy.

That’s right; notebooks are perfect for situations where you want to combine plain text with rich text elements such as graphics, calculations, etc.

The topic of today’s blog post focuses on the two notebooks that are popular with R users, namely, the Jupyter Notebook and, even though it’s still quite new, the R Markdown Notebook. You’ll discover how to use these notebooks, how they compare to one another and what other alternatives exist. 

R And The Jupyter Notebook

Contrary to what you might think, Jupyter doesn’t limit you to working solely with Python: the notebook application is language agnostic, which means that you can also work with other languages. 

There are two general ways to get started on using R with Jupyter: by using a kernel or by setting up an R environment that has all the essential tools to get started on doing data science.

Running R in Jupyter With The R Kernel

As described above, the first way to run R is by using a kernel. If you want to have a complete list of all the available kernels in Jupyter, go here.

To work with R, you’ll need to load the IRKernel and activate it to get started on working with R in the notebook environment.

First, you’ll need to install some packages. Make sure that you don’t do this in your RStudio console, but in a regular R terminal, otherwise you’ll get an error like this:

Error in IRkernel::installspec() :
Jupyter or IPython 3.0 has to be installed but could neither run “jupyter” nor “ipython”, “ipython2” or “ipython3”.
(Note that “ipython2” is just IPython for Python 2, but still may be IPython 3.0)

$ R

> install.packages(c('repr', 'IRdisplay', 'evaluate', 'crayon', 'pbdZMQ', 'devtools', 'uuid', 'digest'))

This command will prompt you to type in a number to select a CRAN mirror to install the necessary packages. Enter a number and the installation will continue.

> devtools::install_github('IRkernel/IRkernel')

Then, you still need to make the R kernel visible for Jupyter:

# Install IRKernel for the current user

> IRkernel::installspec()

# Or install IRKernel system-wide

> IRkernel::installspec(user = FALSE)

Now open up the notebook application with jupyter notebook. You’ll see R appearing in the list of kernels when you create a new notebook. 

Using An R Essentials Environment In Jupyter

The second option to quickly work with R is to install the R essentials in your current environment:

conda install -c r r-essentials

These “essentials” include the packages dplyrshinyggplot2tidyrcaret, and nnet. If you don’t want to install the essentials in your current environment, you can use the following command to create a new environment just for the R essentials:

conda create -n my-r-env -c r r-essentials

Now open up the notebook application to start working with R.

You might wonder what you need to do if you want to install additional packages to elaborate your data science project. After all, these packages might be enough to get you started, but you might need other tools.

Well, you can either build a Conda R package by running, for example:

conda skeleton cran ldavis conda build r-ldavis/

Or you can install the package from inside of R via install.packages() or devtools::install_github (to install packages from GitHub). You just have to make sure to add the new package to the correct R library used by Jupyter:

install.packages("ldavis", "/home/user/anaconda3/lib/R/library")

If you want to know more about kernels or about running R in a Docker environment, check out this page.

Adding Some R Magic To Jupyter

A huge advantage of working with notebooks is that they provide you with an interactive environment. That interactivity comes mainly from the so-called “magic commands”.

These commands allow you to switch from Python to command line instructions or to write code in another language such as R, Julia, Scala, …

To switch from Python to R, you first need to download the following package:

%load_ext rpy2.ipython

After that, you can get started with R, or you can easily switch from Python to R in your data analysis with the %R magic command.

Let’s demonstrate how the R magic works with a small example:

# Hide warnings if there are any
import warnings
warnings.filterwarnings('ignore')
# Load in the r magic
%load_ext rpy2.ipython
# We need ggplot2
%R require(ggplot2)
# Load in the pandas library
import pandas as pd 
# Make a pandas DataFrame
df = pd.DataFrame({'Alphabet': ['a', 'b', 'c', 'd','e', 'f', 'g', 'h','i'],
                   'A': [4, 3, 5, 2, 1, 7, 7, 5, 9],
                   'B': [0, 4, 3, 6, 7, 10,11, 9, 13],
                   'C': [1, 2, 3, 1, 2, 3, 1, 2, 3]})
# Take the name of input variable df and assign it to an R variable of the same name
%%R -i df
# Plot the DataFrame df
ggplot(data=df) + geom_point(aes(x=A, y=B, color=C))

If you want more details about Jupyter, on how to set up a notebook, where to download the application, how you can run the notebook application (via Docker, pip install or with the Anaconda distribution) or other details, check out our Definitive Guide.

The R Notebook

Up until recently, Jupyter seems to have been a popular solution for R users, next to notebooks such as Apache Zeppelin or Beaker.

Also, other alternatives to report results of data analyses, such as R Markdown, Knitr or Sweave, have been hugely popular in the R community.

However, this might change with the recent release of the R or R Markdown Notebook by RStudio.

You see it: the context of the R Markdown Notebook is complex, and it’s worth looking into the history of reproducible research in R to understand what drove the creation and development of this notebook. Ultimately, you will also realize that this notebook is different from others. 

R And The History of Reproducible Research

In his talk, J.J Allaire, confirms that the efforts in R itself for reproducible research, the efforts of Emacs to combine text code and input, the Pandoc, Markdown and knitr projects, and computational notebooks have been evolving in parallel and influencing each other for a lot of years. He confirms that all of these factors have eventually led to the creation and development of notebooks for R.

Firstly, computational notebooks have quite a history: since the late 80s, when Mathematica’s front end was released, there have been a lot of advancements. In 2001, Fernando Pérez started developing IPython, but only in 2011 the team released the 0.12 version of IPython was realized. The SageMath project began in 2004. After that, there have been many notebooks. The most notable ones for the data science community are the Beaker (2013), Jupyter (2014) and Apache Zeppelin (2015). 

Then, there are also the markup languages and text editors that have influenced the creation of RStudio’s notebook application, namely, Emacs, Markdown, and Pandoc. Org-mode was released in 2003. It’s an editing and organizing mode for notes, planning and authoring in the free software text editor Emacs. Six years later, Emacs org-R was there to provide support for R users. Markdown, on the other hand, was released in 2004 as a markup language that allows you to format your plain text in such a way that it can be converted to HTML or other formats. Fast forward another couple of years, and Pandoc was released. It’s a writing tool and as a basis for publishing workflows.

Lastly, the efforts of the R community to make sure that research can be reproducible and transparent have also contributed to the rise of a notebook for R. 2002, Sweave was introduced in 2002 to allow the embedding of R code within LaTeX documents to generate PDF files. These pdf files combined the narrative and analysis, graphics, code, and the results of computations. Ten years later, knitr was developed to solve long-standing problems in Sweave and to combine features that were present in other add-on packages into one single package. It’s a transparent engine for dynamic report generation in R. Knitr allows any input languages and any output markup languages.

Also in 2012, R Markdown was created as a variant of Markdown that can embed R code chunks and that can be used with knitr to create reproducible web-based reports. The big advantage was and still is that it isn’t necessary anymore to use LaTex, which has a learning curve to learn and use. The syntax of R Markdown is very similar to the regular Markdown syntax but does have some tweaks to it, as you can include, for example, LaTex equations.

R Markdown Versus Computational Notebooks

R Markdown is probably one of the most popular options in the R community to report on data analyses. It’s no surprise whatsoever that it is still a core component in the R Markdown Notebook. 

And there are some things that R Markdown and notebooks share, such as the delivering of a reproducible workflow, the weaving of code, output, and text together in a single document, supporting interactive widgets and outputting to multiple formats. However, they differ in their emphases: R Markdown focuses on reproducible batch execution, plain text representation, version control, production output and offers the same editor and tools that you use for R scripts.

On the other hand, the traditional computational notebooks focus on outputting inline with code, caching the output across sessions, sharing code and outputting in a single file. Notebooks have an emphasis on an interactive execution model. They don’t use a plain text representation, but a structured data representation, such as JSON.

That all explains the purpose of RStudio’s notebook application: it combines all the advantages of R Markdown with the good things that computational notebooks have to offer.

That’s why R Markdown is a core component of the R Markdown Notebook: RStudio defines its notebook as “an R Markdown document with chunks that can be executed independently and interactively, with output visible immediately beneath the input”.

How To Work With R Notebooks

If you’ve ever worked with Jupyter or any other computational notebook, you’ll see that the workflow is very similar. One thing that might seem very different is the fact that now you’re not working with code cells anymore by default: you’re rather working with a sort of text editor in which you indicate your code chunks with R Markdown.

How To Install And Use The R Markdown Notebook

The first requirement to use the notebook is that you have the newest version of RStudio available on your PC. Since notebooks are a new feature of RStudio, they are only available in version 1.0 or higher of RStudio. So, it’s important to check if you have a correct version installed.

If you don’t have version 1.0 or higher of RStudio, you can download the latest version here.

Then, to make a new notebook, you go to File tab, select“New File”, and you’ll see the option to create a new R Markdown Notebook. If RStudio prompts you to update some packages, just accept the offer and eventually a new file will appear.

Tip: double-check whether you’re working with a notebook by looking at the top of your document. The output should be html_notebook.

You’ll see that the default text that appears in the document is in R Markdown. R Markdown should feel pretty familiar to you, but if you’re not yet quite proficient, you can always check out our Reporting With R Markdown course or go through the material that is provided by RStudio.

Note that you can always use the gear icon to adjust the notebook’s working space: you have the option to expand, collapse, and remove the output of your code, to change the preview options and to modify the output options.

This last option can come in handy if you want to change the syntax highlighting, apply another theme, adjust the default width and height of the figures appearing in your output, etc.

From there onwards, you can start inserting code chunks and text!

You can add code chunks in two ways: through the keyboard shortcut Ctrl + Alt + I or Cmd + Option + I or with the insert button that you find in the toolbar. 

What’s great about working with these R Markdown notebooks is the fact that you can follow up on the execution of your code chunks, thanks to the little green bar that appears on the left when you’re executing large code chunks or multiple code chunks at once. Also, note that there’s a progress bar on the bottom.

You can see the green progress bar appearing in the gif below:

Talking about code execution: there are multiple ways in which you can execute your R code chunks.

You can run a code chunk or run the next chunk, run all code chunks below and above; but you can also choose to restart R and run all chunks or to restart and to clear the output.

Note that when you execute the notebook’s code, you will also see the output appearing on your console! That might be a rather big difference for those who usually work with other computational notebooks such as Jupyter.

If there are any errors while the notebook’s code chunks are being executed, the execution will stop, and there will appear a red bar alongside the code piece that produces the error.

You can suppress the halt of the execution by adding errors = TRUE in the chunk options, just like this:

```{r, error=TRUE}
iris <- read.csv(url("http://mlr.cs.umass.edu/ml/machine-leaning-databases/"), header = FALSE)
names(iris) <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species")
```

Note that the error will still appear, but that the notebook’s code execution won’t be halted!

How To Use R Markdown Notebook’s Magic

Just like with Jupyter, you can also work interactively with your R Markdown notebooks. It works a bit differently from Jupyter, as there are no real magic commands; To work with other languages, you need to add separate Bash, Stan, Python, SQL or Rcpp chunks to the notebook. 

These options might seem quite limited to you, but it’s compensated in the ease with which you can easily add these types of code chunks with the toolbar’s insert button.

Also working with these code chunks is easy: you can see an example of SQL chunks in this document, published by J.J Allaire. For Bash commands, you just type the command. There’s no need extra characters such as ‘!’ to signal that you’re working in Bash, like you would do when you would work with Jupyter. 

How To Output Your R Markdown Notebooks 

Before you render the final version of a notebook, you might want to preview what you have been doing. There’s a handy feature that allows you to do this: you’ll find it in your toolbar. 

Click on the “preview” button and the provisional version of your document will pop up on the right-hand side, in the “Viewer” tab.

By adding some lines to the first section on top of the notebook, you can adjust your output options, like this:

---
title: "Notebook with KNN Example"
output:
  pdf_document:
    highlight: tango
    toc: yes
  html_notebook:
    toc: yes
---

To see where you can get those distributions, you can just try to knit, and the console output will give you the sites where you can download the necessary packages. 

Note that this is just one of the many options that you have to export a notebook: there’s also the possibility to render GitHub documents, word documents, beamer presentation, etc. These are the output options that you already had with regular R Markdown files. You can find more info here

Tips And Tricks To Work With R Notebook

Besides the general coding practices that you should keep in mind, such as documenting your code and applying a consistent naming scheme, code grouping and name length, you can also use the following tips to make a notebook awesome for others to use and read:

The R Notebook Versus The Jupyter Notebook

Besides the differences between the Jupyter and R Markdown notebooks that you have already read above, there are some more things.

Let’s compare Jupyter with the R Markdown Notebook!

There are four aspects that you will find interesting to consider: notebook sharing, code execution, version control, and project management.

Notebook Sharing

The source code for an R Markdown notebook is an .Rmd file. But when you save a notebook, an .nb.html file is created alongside it. This HTML file is an associated file that includes a copy of the R Markdown source code and the generated output. 

That means that you need no special viewer to see the file, while you might need it to view notebooks that were made with the Jupyter application, which are simple JSON documents, or other computational notebooks that have structured format outputs. You can publish your R Markdown notebook on any web server, GitHub or as an email attachment.

There also are APIs to render and parse R Markdown notebooks: this gives other frontend tools the ability to create notebook authoring modes for R Markdown. Or the APIs can be used to create conversion utilities to and from different notebook formats.

To share the notebooks you make in the Jupyter application, you can export the notebooks as slideshows, blogs, dashboards, etc. You can find more information in this tutorial. However, there are also the default options to generate Python scripts, HTML files, Markdown files, PDF files or reStructured Text files.

Code Execution

R Markdown Notebooks have options to run a code chunk or run the next chunk, run all code chunks below and above; In addition to these options, you can also choose to restart R and run all chunks or to restart and to clear the output.

These options are interesting when you’re working with R because the R Markdown Notebook allows all R code pieces to share the same environment. However, this can prove to be a huge disadvantage if you’re working with non-R code pieces, as these don’t share environments.

All in all, these code execution options add a considerable amount of flexibility for the users who have been struggling with the code execution options that Jupyter offers, even though if these are not too much different: in the Jupyter application, you have the option to run a single cell, to run cells and to run all cells. You can also choose to clear the current or all outputs. The code environment is shared between code cells.

Version control

There have been claims that Jupyter messes up the version control of notebooks or that it’s hard to use git with these notebooks. Solutions to this issue are to export the notebook as a script or to set up a filter to fix parts of the metadata that shouldn’t change when you commit or to strip the run count and output.

The R Markdown notebooks seem to make this issue a bit easier to handle, as they have associated HTML files that save the output of your code and the fact that the notebook files are essentially plain text files, version control will be much easier. You can choose to only put your .Rmd file on GitHub or your other versioning system, or you can also include the .nb.html file.

Project Management

As the R Markdown Notebook is native to the RStudio development kit, the notebooks will seamlessly integrate with your R projects. Also, these notebooks support other languages, including Python, C, and SQL.

On the other hand, the Jupyter project is not native to any development kit: in that sense, it will cost some effort to integrate this notebook seamlessly with your projects. But this notebook still supports more languages and will be a more suitable companion for you if you’re looking for use Scala, Apache Toree, Julia, or another language. 

Alternatives to Jupyter or R Markdown Notebooks

Apart from the notebooks that you can use as interactive data science environments which make it easy for you to share your code with colleagues, peers, and friends, there are also other alternatives to consider.

Because sometimes you don’t need a notebook, but a dashboard, an interactive learning platform or a book, for example.

You have already read about options such as Sweave and Knitr in the second section. Some other options that are out there, are:

To leave a comment for the author, please follow the link and comment on their blog: DataCamp Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.