Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Data Science has (unnecessarily) divided the world into two halves – R users and Python users. Irrelevant of the group you belong to, there’s one thing you have to admit – each language individually has libraries far superior to anything available in the alternative. For example, R Shiny is much easier for beginners than anything Python offers. But what about basic data visualization? That’s where this Matplotlib vs. ggplot article comes in.
Today we’ll see how R and Python compare in basic data visualization. We’ll compare their standard plotting libraries – Matplotlib and ggplot to see which one is easier to use and which looks better at the end. We’ll also show you how to include Matplotlib charts in R Shiny dashboards, as that’s been a common pain point for Python users. What’s even better, the chart will react to user input.
Want to use R and Python together? Here are 2 packages you get you started.
Table of contents:
- Matplotlib vs. ggplot – Which is Better for Basic Plots?
- Matplotlib vs. ggplot – Which is easier to customize?
- How to Include ggplot Charts in R Shiny
- How to Use Matplotlib Charts in R Shiny
- Summary of Matplotlib vs. ggplot
Matplotlib vs. ggplot – Which is Better for Basic Plots?
There’s no denying that both Matplotlib and ggplot don’t look the best by default. There’s a lot you can change, of course, but we’ll get to that later. The aim of this section is to compare Matplotlib and ggplot in the realm of unstyled visualizations.
To keep things simple, we’ll only make a scatter plot of the well-known mtcars
dataset, in which X-axis shows miles per gallon and Y-axis shows the corresponding horsepower.
Are you new to scatter plots? Here’s our complete guide to get you started.
There’s not a lot you have to do to produce this visualization in R ggplot:
library(ggplot2) ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point()
It’s a bit dull by default, but is Matplotlib better?
The mtcars
dataset isn’t included in Python, so we have to download and parse the dataset from GitHub. After doing so, a simple call to ax.scatter()
puts both variables on their respective axes:
import pandas as pd import matplotlib.pyplot as plt mtcars = pd.read_csv("https://gist.githubusercontent.com/ZeccaLehn/4e06d2575eb9589dbe8c365d61cb056c/raw/898a40b035f7c951579041aecbfb2149331fa9f6/mtcars.csv", index_col=[0]) fig, ax = plt.subplots(figsize=(13, 8)) ax.scatter(x=mtcars["mpg"], y=mtcars["hp"])
It would be unfair to call ggplot superior to Matplotlib, for the pure fact that the dataset comes included with R. Python requires an extra step.
From the visual point of view, things are highly subjective. Matplotlib figures have a lower resolution by default, so the whole thing looks blurry. Other than that, declaring a winner is near impossible.
Do you prefer Matplotlib or ggplot2 default stylings? Let us know in the comment section below.
Let’s add some styles to see which one is easier to customize.
Matplotlib vs. ggplot – Which is easier to customize?
To keep things simple, we’ll modify only a couple of things:
- Change the point sizing by the
qsec
variable - Change the point color by the
cyl
variable - Add a custom color palette for three distinct color factors
- Change the theme
- Remove the legend
- Add title
In R ggplot, that boils down to adding a couple of lines of code:
ggplot(data = mtcars, aes(x = mpg, y = hp)) + geom_point(aes(size = qsec, color = factor(cyl))) + scale_color_manual(values = c("#3C6E71", "#70AE6E", "#BEEE62")) + theme_classic() + theme(legend.position = "none") + labs(title = "Miles per Gallon vs. Horse Power")
The chart now actually looks usable, both for reporting and dashboarding purposes.
But how difficult it is to produce the same chart in Python? Let’s take a look. For starters, we’ll increase the DPI to get rid of the blurriness, and also remove the top and right lines around the figure.
Changing point size and color is a bit trickier to do in Matplotlib, but it’s just a matter of experience and preference. Also, Matplotlib doesn’t place labels on axes by default – consider this as a pro or a con. We’ll add them manually:
plt.rcParams["figure.dpi"] = 300 plt.rcParams["axes.spines.top"] = False plt.rcParams["axes.spines.right"] = False fig, ax = plt.subplots(figsize=(13, 8)) ax.scatter( x=mtcars["mpg"], y=mtcars["hp"], s=[s**1.8 for s in mtcars["qsec"].to_numpy()], c=["#3C6E71" if cyl == 4 else "#70AE6E" if cyl == 6 else "#BEEE62" for cyl in mtcars["cyl"].to_numpy()] ) ax.set_title("Miles per Gallon vs. Horse Power", size=18, loc="left") ax.set_xlabel("mpg", size=14) ax.set_ylabel("hp", size=14)
The figures look almost identical, so what’s the verdict? Is it better to use Python’s Matplotlib or R’s ggplot2?
Objectively speaking, Python’s Matplotlib requires more code to do the same thing when compared to R’s ggplot2. Further, Python’s code is harder to read, due to bracket notation for variable access and inline conditional statements.
So, does ggplot2 take the win here? Well, no. If you’re a Python user it will take you less time to create a chart in Matplotlib than it would to learn a whole new language/library. The same goes the other way.
Up next, we’ll see how easy it is to include this chart in an interactive dashboard.
How to Include ggplot Charts in R Shiny
Shiny is an R package for creating dashboards around your data. It’s built for R programming language, and hence integrates nicely with most of the other R packages – ggplot2 included.
We’ll now create a simple R Shiny dashboard that allows you to select columns for the X and Y axis and then updates the figure automatically. If you have more than 30 minutes of R Shiny experience, the code snippet below shouldn’t be difficult to read:
library(shiny) library(ggplot2) ui <- fluidPage( tags$h3("Scatter plot generator"), selectInput(inputId = "x", label = "X Axis", choices = names(mtcars), selected = "mpg"), selectInput(inputId = "y", label = "Y Axis", choices = names(mtcars), selected = "hp"), plotOutput(outputId = "scatterPlot") ) server <- function(input, output, session) { data <- reactive({mtcars}) output$scatterPlot <- renderPlot({ ggplot(data = data(), aes_string(x = input$x, y = input$y)) + geom_point(aes(size = qsec, color = factor(cyl))) + scale_color_manual(values = c("#3C6E71", "#70AE6E", "#BEEE62")) + theme_classic() + theme(legend.position = "none") }) } shinyApp(ui = ui, server = server)
Put simply, we’re rerendering the chart every time one of the inputs changes. All computations are done in R, and the update is almost instant. Makes sense, since mtcars
is a tiny dataset.
But how about rendering a Matplotlib chart in R Shiny? Let’s see if it’s even possible.
How to Use Matplotlib Charts in R Shiny
There are several ways to combine R and Python – reticulate being one of them. However, we won’t use that kind of bridging library today.
Instead, we’ll opt for a simpler solution – calling a Python script from R. The mentioned Python script will be responsible for saving a Matplotlib figure in JPG form. In Shiny, the image will be rendered with the renderImage()
reactive function.
Let’s write the script – generate_scatter_plot.py
. It leverages the argparse
module to accept arguments when executed from the command line. As you would expect, the script accepts column names for the X and Y axis as command line arguments. The rest of the script should feel familiar, as we explored it in the previous section:
import argparse import pandas as pd import matplotlib.pyplot as plt # Tweak matplotlib defaults plt.rcParams["figure.dpi"] = 300 plt.rcParams["axes.spines.top"] = False plt.rcParams["axes.spines.right"] = False # Get and parse the arguments from the command line parser = argparse.ArgumentParser() parser.add_argument("--x", help="X-axis column name", type=str, required=True) parser.add_argument("--y", help="Y-axis column name", type=str, required=True) args = parser.parse_args() # Fetch the dataset mtcars = pd.read_csv("https://gist.githubusercontent.com/ZeccaLehn/4e06d2575eb9589dbe8c365d61cb056c/raw/898a40b035f7c951579041aecbfb2149331fa9f6/mtcars.csv", index_col=[0]) # Create the plot fig, ax = plt.subplots(figsize=(13, 7)) ax.scatter( x=mtcars[args.x], y=mtcars[args.y], s=[s**1.8 for s in mtcars["qsec"].to_numpy()], c=["#3C6E71" if cyl == 4 else "#70AE6E" if cyl == 6 else "#BEEE62" for cyl in mtcars["cyl"].to_numpy()] ) # Save the figure fig.savefig("scatterplot.jpg", bbox_inches="tight")
You can run the script from the command line for verification:
If all went well, it should have saved a scatterplot.jpg
to disk:
Everything looks as it should, but what’s the procedure in R Shiny? Here’s a list of things we have to do:
- Replace
plotOutput()
withimageOutput()
– we’re rendering an image afterall - Construct a shell command as a reactive expression – it will run the
generate_scatter_plot.py
file and pass in the command line arguments gathered from the currently selected dropdown values - Use
renderImage()
reactive function to execute the shell command and load in the image
It sounds like a lot, but it doesn’t require much more code than the previous R example. Just remember to specify a full path to the Python executable when constructing a shell command.
Here’s the entire code snippet:
library(shiny) ui <- fluidPage( tags$head( tags$style(HTML(" #scatterPlot > img { max-width: 800px; } ")) ), tags$h3("Scatter plot generator"), selectInput(inputId = "x", label = "X Axis", choices = names(mtcars), selected = "mpg"), selectInput(inputId = "y", label = "Y Axis", choices = names(mtcars), selected = "hp"), imageOutput(outputId = "scatterPlot") ) server <- function(input, output, session) { # Construct a shell command to run Python script from the user input shell_command <- reactive({ paste0("/Users/dradecic/miniforge3/bin/python generate_scatter_plot.py --x ", input$x, " --y ", input$y) }) # Render the matplotlib plot as an image output$scatterPlot <- renderImage({ # Run the shell command to generate image - saved as "scatterplot.jpg" system(shell_command()) # Show the image list(src = "scatterplot.jpg") }) }
The dashboard takes some extra time to rerender the chart, which is expected. After all, R needs to call a Python script which then constructs and saves the chart to the disk. It’s an extra step, so the refresh isn’t as instant as with ggplot2.
Summary of Matplotlib vs. ggplot
To conclude, you can definitely use Python’s Matplotlib library in R Shiny dashboards. There are a couple of extra steps involved, but nothing you can’t manage. If you’re a heavy Python user and want to try R Shiny, this could be the fastest way to get started.
What do you think of Matplotlib in R Shiny? What do you generally prefer – Matplotlib or ggplot2? Please let us know in the comment section below. Also, don’t hesitate to reach out on Twitter if you use another approach to render Matplotlib charts in Shiny – @appsilon. We’d love to hear your comments.
R Shiny and Tableau? Learn to create custom Tableau extensions from R Shiny.
The post Matplotlib vs. ggplot: How to Use Both in R Shiny Apps appeared first on Appsilon | Enterprise R Shiny Dashboards.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.