Upgrading to plotly 4.0 (and above)
[This article was first published on R – Modern Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
By Carson Sievert, lead Plotly R developer
I’m excited to announce that plotly’s R package just sent its first CRAN update in nearly four months. To install the update, run Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
install.packages("plotly")
.
This update has breaking changes, enables new features, fixes numerous bugs, and takes us from version 3.6.0 to 4.5.2. To see all the changes, I encourage you to read the NEWS file. In this post, I’ll highlight the most important changes, explain why they needed to happen, and provide some tips for fixing errors brought about by this update. As you’ll see, this update is mostly about improving the plot_ly()
interface, so ggplotly()
users won’t notice much (if any) change. I’ve also started a plotly for R book which provides more narrative than the documentation on https://plot.ly/r (which is now updated to 4.0), more recent examples, and features exclusive to the R package. The first three chapters are nearly finished and replace the package vignettes. The later chapters are still in their beginning stages – they discuss features that are still under development, but I plan adding stability, and more documentation in the coming months.
Formula mappings
In the past, you could use an expression to reference variable(s) in a data frame, but this no longer works. Consequently, you might see an error like this when you update:library(plotly) plot_ly(mtcars, x = mpg, y = sqrt(wt)) ## Error in plot_ly(mtcars, x = mpg, y = sqrt(wt)): object 'wt' not found
plot_ly()
now requires a formula (which is basically an expression, but with a ~
prefixed) when referencing variables. You do not have to use a formula to reference objects that exist in the namespace, but I recommend it, since it helps populate sensible axis/guide title defaults (e.g., compare the output of plot_ly(z = volcano)
with plot_ly(z = ~volcano)
).
plot_ly(mtcars, x = ~mpg, y = ~sqrt(wt))There are a number of technical reasons why imposing this change from expressions to formulas is a good idea. If you’re interested in the details, I recommend reading Hadley Wickham’s notes on non-standard evaluation, but here’s the gist of the situation:
- Since formulas capture the environment in which they are created, we can be confident that evaluation rules are always correct, no matter the context.
- Compared to expressions/symbols, formulas are easier to program with, which makes writing custom functions around
plot_ly()
easier.
myPlot <- function(x, y, ...) { plot_ly(mtcars, x = x, y = y, color = ~factor(cyl), ...) } myPlot(~mpg, ~disp, colors = "Dark2")Also, it’s fairly easy to convert a string to a formula (e.g.,
as.formula("~sqrt(wt)")
). This trick can be quite useful when programming in shiny (and a variable mapping depends on an input value).
Smarter defaults
Instead of always defaulting to a “scatter” trace,plot_ly()
now infers a sensible trace type (and other attribute defaults) based on the information provided. These defaults are determined by inspecting the vector type (e.g., numeric/character/factor/etc) of positional attributes (e.g., x/y). For example, if we supply a discrete variable to x (or y), we get a vertical (or horizontal) bar chart:
subplot( plot_ly(diamonds, y = ~cut, color = ~clarity), plot_ly(diamonds, x = ~cut, color = ~clarity), margin = 0.07 ) %>% hide_legend()Or, if we supply two discrete variables to both x and y:
plot_ly(diamonds, x = ~cut, y = ~clarity)Also, the order of categories on a discrete axis, by default, is now either alphabetical (for character strings) or matches the ordering of factor levels. This makes it easier to sort categories according to something meaningful, rather than the order in which the categories appear (the old default). If you prefer the old default, use
layout(categoryorder = "trace")
library(dplyr) # order the clarity levels by their median price d <- diamonds %>% group_by(clarity) %>% summarise(m = median(price)) %>% arrange(m) diamonds$clarity <- factor(diamonds$clarity, levels = d[["clarity"]]) plot_ly(diamonds, x = ~price, y = ~clarity, type = "box")
plot_ly()
now initializes a plot
Previously plot_ly()
always produced at least one trace, even when using add_trace()
to add on more traces (if you’re familiar with ggplot2 lingo, a trace is similar to a layer). From now on, you’ll have to specify the type
in plot_ly()
if you want it to always produce a trace:
subplot( plot_ly(economics, x = ~date, y = ~psavert, type = "scatter") %>% add_trace(y = ~uempmed) %>% layout(yaxis = list(title = "Two Traces")), plot_ly(economics, x = ~date, y = ~psavert) %>% add_trace(y = ~uempmed) %>% layout(yaxis = list(title = "One Trace")), titleY = TRUE, shareX = TRUE, nrows = 2 ) %>% hide_legend()Why enforce this change? Often times, when composing a plot with multiple traces, you have attributes that are shared across traces (i.e., global) and attributes that are not. By allowing
plot_ly()
to simply initialize the plot and define global attributes, it makes for a much more natural to describe such a plot. Consider the next example, where we declare x/y (longitude/latitude) attributes and alpha transparency globally, but alter trace specific attributes in add_trace()
-like functions. This example also takes advantage of a few other new features:
- The
group_by()
function which defines “groups” within a trace (described in more detail in the next section). - New
add_*()
functions which behave likeadd_trace()
, but are higher-level since they assume a trace type, might set some attribute values (e.g.,add_marker()
set the scatter trace mode to marker), and might trigger other data processing (e.g.,add_lines()
is essentially the same asadd_paths()
, but guarantees values are sorted along the x-axis). - Scaling is avoided for “AsIs” values (i.e., values wrapped with
I()
) which makes it easier directly specify a constant value for a visual attribute(s) (as opposed to mapping data values to visuals). - More support for R’s graphical parameters such as
pch
for symbols andlty
for linetypes.
map_data("world", "canada") %>% group_by(group) %>% plot_ly(x = ~long, y = ~lat, alpha = 0.1) %>% add_polygons(color = I("black"), hoverinfo = "none") %>% add_markers(color = I("red"), symbol = I(17), text = ~paste(name, "<br />", pop), hoverinfo = "text", data = maps::canada.cities) %>% hide_legend()
New interpretation of group
Thegroup
argument in plot_ly()
has been removed in favor of the group_by()
function. In the past, the group
argument incorrectly created multiple traces. If you want that same behavior, use the new split
argument, but groups are now used to define “gaps” within a trace. This is more consistent with how ggplot2’s group
aesthetic is translated in ggplotly()
, and is much more efficient than plotting a trace for each group.
txhousing %>% group_by(city) %>% plot_ly(x = ~date, y = ~median) %>% add_lines(alpha = 0.3)The default hovermode (compare data on hover) isn’t super useful here since we have only 1 trace to compare, so you may want to add
layout(hovermode = "closest")
when using group_by()
. If you’re group sizes aren’t that large, you may want to use split
to generate one trace per group, then set a constant color (using the I()
function to avoid scaling).
txhousing %>% plot_ly(x = ~date, y = ~median) %>% add_lines(split = ~city, color = I("steelblue"), alpha = 0.3)In the coming months, we will have better ways to identify/highlight groups to help combat overplotting (see here for example). This same interface can be used to coordinate multiple linked plots, which is a powerful tool for exploring multivariate data and presenting multivariate results (see here and here for examples).
New plotly object representation
Prior to version 4.0, plotly functions returned a data frame with special attributes attached (needed to track the plot’s attributes). At the time, I thought this was the right way to enable a “data-plot-pipeline” where a plot is described as a sequence of visual mappings and data manipulations. For a number of reasons, I’ve been convinced otherwise, and decided the central plotly object should inherit from an htmlwidget object instead. This change does not destroy our ability to implement a “data-plot-pipeline”, but it does, in a sense, constrain the set manipulations we can perform on a plotly object. Below is a simple example of transforming the data underlying a plotly object using dplyr’smutate()
and filter()
verbs (the plotly book has a whole section on the data-plot-pipeline, if you’d like to learn more).
library(dplyr) economics %>% plot_ly(x = ~date, y = ~unemploy / pop, showlegend = F) %>% add_lines(linetype = I(22)) %>% mutate(rate = unemploy / pop) %>% slice(which.max(rate)) %>% add_markers(symbol = I(10), size = I(50)) %>% add_annotations("peak")In this context, I’ve often found it helpful to inspect the (most recent) data associated with a particular plot, which you can do via
plotly_data()
diamonds %>% group_by(cut) %>% plot_ly(x = ~price) %>% plotly_data() ## Source: local data frame [53,940 x 10] ## Groups: cut [5] ## ## carat cut color clarity depth table price x y z ## <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl> ## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43 ## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31 ## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31 ## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63 ## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75 ## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48 ## 7 0.24 Very Good I VVS1 62.3 57 336 3.95 3.98 2.47 ## 8 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53 ## 9 0.22 Fair E VS2 65.1 61 337 3.87 3.78 2.49 ## 10 0.23 Very Good H VS1 59.4 61 338 4.00 4.05 2.39 ## # ... with 53,930 more rowsTo keep up to date with currently supported data manipulation verbs, please consult the
help(reexports)
page, and for more examples, check out the examples section under help(plotly_data)
.
This change in the representation of a plotly object also has important implications for folks using plotly_build()
to “manually” access or modify a plot’s underlying spec. Previously, this function returned the JSON spec as an R list, but it now returns more “meta” information about the htmlwidget, so in order to access that same list, you have to grab the “x” element. The new as_widget()
function (different from the now deprecated as.widget()
function) is designed to turn a plotly spec into an htmlwidget object.
# the style() function provides a more elegant way to do this sort of thing, # but I know some people like to work with the list object directly... pl <- plotly_build(qplot(1:10))[["x"]] pl$data[[1]]$hoverinfo <- "none" as_widget(pl)
Conclusion
The latest CRAN release upgrades plotly’s R package from version 3.6.0 to 4.5.2. This upgrade includes a number of breaking changes, as well as a ton of new features and bug fixes. The time spent upgrading your code will be worth it as enables a ton of new features. It also provides a better foundation for advancing theplot_ly()
interface (not to mention the linked highlighting stuff we have on tap). This post should provide the information necessary to fix these breaking changes, but if you have any trouble upgrading, please let us know on http://community.plot.ly. Happy plotting!
To leave a comment for the author, please follow the link and comment on their blog: R – Modern Data.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.