My RStudio::Conf 2020 / TidyDevDay Roundup & Reflections!

R by R(yo)

2 years ago

[This article was first published on R by R(yo), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

RStudio::Conference 2020 was held in San Francisco, California and kick started a new decade for the R community with a bang! Following some great workshops on a wide variety of topics such as JavaScript for Shiny Users to Designing the Data Science Classroom there were two days full of great talks as well as the Tidyverse Dev Day the day after the conference.

This was my third consecutive RStudio::Conf and I am delighted to have gone again, especially as I grew up in the Bay Area. For the second year running I am writing a roundup blog post on the conference (see last year’s here).

Another great resource for the conference (besides the #rstats / #RStudioConf2020 Twitter) is the RStudioConf 2020 Slides Github repo curated by Emil Hvitfeldt.

Unfortunately, I arrived to the conference late as United Airlines being the fantastic service that they are cancelled my flight. So, I came into SFO right around when JJ Allaire was giving his fantastic keynote on RStudio becoming a Public Benefit Corporation (read more about that here) among other morning talks. I missed the first half of Day 1 and arrived extremely tired to the conference venue for lunch.

Grousing about United Airlines aside, let’s get started!

NOTE: As with any conference there were a lot of great talks competing against each other in the same time slot and I also wasn’t able to write about all the talks I saw in this blog post.

Making the Shiny Contest!

Mine Cetinkaya-Rundel talked about last year’s first Shiny Contest. For those who don’t know, this was a contest created by RStudio and hosted on RStudio Community for creating the best Shiny apps. You can check out all of last year’s (and soon this year’s) submissions from the “shiny-contest” tag on RStudio Community here. Last year’s winners included Kevin Rue, Charlotte Soneson, Federico Marini, Aaron Lun’s iSEE (Interactive Summarized Experiment Explorer) app for Most Technically Impressive, David Smale’s 69 Love Songs: A Lyrical Analysis for Best Design, Victor Perrier’s Hex Memory Game for Most Fun, and Jenna Allen’s Pet Records app for The "Awww" Award.

Mine gave a bit of background on the creation of the contest, mainly on how it was inspired by the {bookdown} contest that Yihui Xie organized a few years prior.

For this year’s contest, taking place from January 29th to March 20th, Mine talked about some of the requirements and nice-to-haves from feedback and reflection on last year’s contest.

Requirements:

Code (Github repo)
RStudio Cloud hosted project
Deployed App

Nice To Have:

Summary (in submission post)
Highlights
Screenshot

Also Mine mentioned how participants should self-categorize themselves by experience level, should start reviewing submissions earlier (HUGE uptick in submissions in the last weeks of the contest), and that organizers should give out better guidelines for what makes a “good” submission. For evaluations, apps will be judged based on technical and/or artistic achievement while also considering the feedback and reaction from the RStudio Community submission post.

Slides (Not yet available)
More Shiny apps have been added to a revamped Shiny Gallery page!

Technical Debt is a Social Problem

Gordon Shotwell talked about how technical debt is a social problem and not just a technical one. Technical debt can be defined as taking shortcuts that make a product/tool less stable and the social aspect of this is that it can also be accumulated through bad documentation and/or bad project organization. With some experience being in a position where he lacked the necessary decision-making power to fix certain problems, Gordon’s talk was heavily influenced by how he had to be strategic about creating robust solutions and how he came to realize that technical debt was both a failure of communication and consideration.

Communication:

Documentation (Use/Purpose of the Code)
Testing (Correctness of the Code)
Coding Style (Consistency/Legibility of the Code)
Project Organization (Organization of the Code)

Consideration:

Robustness (Accurate/Unambiguous Error Messages?)
Updated Easily
Solves Future Problems
Dependencies (Dependency Management)
Scale (Up/Down?)

From the above, Gordon asked everyone: How well do you think about how other people might use your code/software to fix problems?

The key to answering this question is to build what Gordon called “delightful products”:

One-step Setup
Clear Problem
Obvious First Action
Path to Mastery
Help is Available
Never Breaks

To build these “delightful products” Gordon went over 3 concepts, find the right beachhead, separate users and maintainers, and empathize with the debtor.

Find the Right Beachhead: You should pick the correct battle to fight and position yourself for smart deployment. If you can then accomplish the project, you build trust! To do this, make a big improvement in a small area, work on a small contained project.
Separate Users & Maintainers: Users Get Coddled & Maintainers Get Opinions!: A good maintainer has responsibility and authority for the “delightful” product. He/she does not blame the user or asks users to maintain the tool. A good user defines what makes the product “delightful” and asks maintainers about the problem rather than the solution.
Empathize with the Debtor: Technical debt can make people/you uncomfortable and its important to be supportive so everyone can learn from the experience!

In conclusion, Gordon said that you should thank your code and that technical debt is a good thing. This is because even if there are some problems there are people who care enough to have done something about it in the first place and from there maintainers and users can work together to review and improve your tools!

Object of Type Closure is Not Subsettable

In her keynote talk to kick start Day 2, Jenny Bryan talked about debugging your R code. Jenny went over four key concepts: hard reset, minimal reprex, debugging, deterring.

Hard resetting is basically the old “have you tried turning it off and then on again?” and in the context of R it is restarting your R session (you can set a keyboard shortcut via the “Tools” button in the menu bar). However, it’s also very important that you make sure you are not saving your .Rdata on exit. This is so you have a completely clean slate when you restart your session. Another warning Jenny gave was that you should not use rm(list = ls()) as this doesn’t clean any library() calls or sys.setenv() calls among a few other important things that might be causing the issue.

Making a minimal reprex means to make a small concrete reproducible example that either reveals/confirms/eliminates something about your issue. Jenny also described them as making a beautiful version of your pain that other people are more likely to engage with. Making one is very easy now with the {reprex} package and it helps those people trying to help you with your problem on Stack Overflow, RStudio Community, etc. Still, asking and formulating your question can be very hard as you need to work hard on bridging the gap between what you think is happening vs. what is actually happening.

Debugging comprises of a few different steps and Jenny used death as a metaphor. “Death certificate” was used for functions like traceback() which shows exactly where and which functions were run to produce the error. “Autopsy” was used for option(error = recover) for when you need to look into the actual environment of certain function calls when they errored. Finally, reanimation/resuscitation was used for browser() and debug()/debugonce() as it allows you to re-enter the code and its environment right before the error or to the very top of the function (depending on where you insert the browser() call).

Deterring is that once you fix your code once you want to keep it fixed. The best way to do this is to add tests and assertions using packages like {testthat} and {assertr}. You can also automate these checks using continuous integration methods such as Travis CI or Github Actions. Lastly, Jenny talked about when writing code it’s important to write error messages that are easy for humans to understand!

Other resources for debugging:

{renv}: Project Environments for R

The {renv} package (an RStudio project led by the presenter, Kevin Ushey) is the spiritual successor to {packrat} for dependency management in R. When we talk about libraries in R, the most basic definition is “a directory into which packages are installed”. Each R session is configured to use multiple library paths, you can see for yourself using the .libPaths() function, and a “user” and “system” library is usually the default that everybody has. You can also use find.package() to find where exactly a package is located on your system. The challenge with package management in R is that by default, each R session uses the same set of library paths. For different R projects you might have different package dependencies, however if you install a new version of a package, it will change the version of that package in every project.

The three key concepts of {renv} work to make your R projects more:

Isolated: Each project gets its own library of R packages, this allows you to upgrade and change packages within your project but not break your other projects!
Portable: {renv} captures the state of R packages in a project within a LOCKFILE. This LOCKFILE is easy share and makes collaborating with others easy as everybody working from a common base of R packages.
Reproducible: Using the renv::snapshot() function lets you save state of R library to LOCKFILE, called “renv.lock”.

The basic workflow for {renv} is as follows:

renv::init(): This function activates {renv} for your project. It forks the state of your default R libs into a project-local library and prepares infrastructure to use {renv} for the project. A project-local .Rprofile is created/amended for new R sessions for that project.
renv::snapshot(): This function captures the state of your project library and write that to a LOCKFILE, “renv.lock”.
renv::restore(): This function shows all the packages that are updated/installed in the project-local library as a list. It restores your project library with the state specified in the LOCKFILE that you created previously with renv::snapshot().

{renv} can install packages from many sources, Github, Gitlab, CRAN, Bioconductor, Bitbucket, and private repositories with authentication. As there may be many duplicates of identical packages across projects, {renv} also uses a global package cache to keep your disk space clean and to lower installation times when restoring your packages.

RStudio v1.3 Preview

In another talk by an RStudio employee, Jonathan McPherson talked about the exciting new features in store for RStudio version 1.3.

For this version one of the key concepts was to increase accessibility of RStudio for those with disabilities by targeting the WCAG 2.1 Level AA:

Increased compatibility with screen readers (annotations, landmarks, navigation panel options)
Better focus management (where the keyboard focus is at any given moment)
Better keyboard navigation and usability (no more tab traps!)
Better UI readability (improve contrast ratios)

The Spell Check tool, which in the past did not provide real-time feedback, could only read the entire document at a time, and the button itself being hard to find, is significantly revamped with new features many R users have been crying out for:

Real-time spell checking
Pre-loaded dictionary (“RStudio” is also included now!)
Right-click for suggestions
Works on comments and roxygen comments

In addition, the global replace tool is also improved in that you can replace everything found with a new string, regex support, and that you can preview your changes in real time.

Although RStudio already worked in prior versions of the iPad the new OS 13 makes everything much more smoother and gives the user a much better experience. In light of the fact that RStudio and the iPad work much better together version 1.3 on the iPad will have much better keyboard support (shortcuts and arrow-keys support)!

Another exciting feature is the ability to script all your RStudio configurations. Every configurable option available from the “Global options”, “workbench key bindings”, “themes”, “templates” are saved as plain text (JSON) files in the ~/.config/rstudio folder and can be set by admins for all users on RStudio Server as well.

You can try out a preview version of 1.3 today here!

Slides

Tidyverse 2019-2020

In this afternoon talk, Hadley Wickham talked about the big tidyverse hits of 2019 and what’s to come in 2020.

Some of the highlights for 2019 included:

citation("tidyverse"): Which allows people to cite the tidyverse in their academic papers
The “ (read: curly-curly) operator: Which reduces the cognitive load for many users confused about all of the new {rlang} syntax
{vctrs}: A developer-focused package for creating new classes of S3 vectors
{vroom}: A fast delimited reader for R, using C++ 11 for multi-threaded computations, as well as the Altrep framework to lazy-load the data only when the user needs it
{tidymodels}: A group of packages for modelling with tidy principles ({rsample}, {recipes}, {parsnip}, etc.)

For 2020, Hadley was most excited about:

{dplyr 1.0.0}
A movement toward more problem oriented documentation.
Less {purrr}: replacement with more functions to handle tasks like importing multiple files at once or tuning multiple models.
{googlesheets4}: Reboot of the old {googlesheets}, improved R interface via the Sheets API v4.

Hadley then talked about some of the lessons learned from the new tidyeval functionalities last year. Some of the mistakes that he talked about was that a lot of previous tidyeval solutions were too partial/piece-meal and that there was too much focus on theory relative to the realities of most data science end-users. Most of the problems identified could be traced to the fact that there was way too much new vocabluary to learn regarding tidyeval. Going forward, Hadley says that they want to get feedback without exposing a lot of new functionality at once and to a specific set of test users while also working to indicate that certain new concepts are still Experimental more clearly.

To make it easier for users to keep track of all the changes happening around the tidyverse packages, a lot more emphasis on the exact “life cycle” of functions are now listed in the documentation. Starting out as “Experimental” then going to “Stable”, if going under review to “Questioning”, and finally to “Deprecated”, “Defunct”, or “Superseded” these tags inform the user about the working status of functions in a package.

The main differences between deprecated and superseded are:

Deprecated: A function that’s on its way out (at least in the near future, < 2 years) and gives a warning when used. Ex. dplyr::tbl_df() & dplyr::do()
Superseded: There is an alternative approach that is easier to use/more powerful/faster. It is recommended that you learn new approach if you have spare time but it’s not going anywhere (however only critical bug fixes will be made to it). Ex. spread()& gather()

As we head into another decade of the {tidyverse} this presentation provided a good overview of what’s been going on and what’s more to come!

Slides (Not yet available)

ggplot2 section

In the afternoon session of Day 2 there was an entire section of talks devoted to {ggplot2}.

First, there was Dewey Dunnington’s presentation on best practices for programming with {ggplot2}, a lot of the material which you might be familiar with from his Using ggplot2 in packages vignette (only available in the not-yet-released version 3.3.0 documentation) last year. In the first section Dewey talked about using tidy evaluation in your mappings and facet specifications when you’re creating custom functions and programming with {ggplot2}. For using {ggplot2} in packages he talked about proper NAMESPACE specifications and getting rid of pesky “undefined variable” notes in your R CMD check results. Lastly, Dewey went over a demo on regression testing your {ggplot2} output with the {vdiffr} package.

Next, Claus Wilke talked about his {ggtext} package which adds a lot of functionality to {ggplot2} with the addition of formatted text. With {ggtext} you can use markdown or HTML syntax to format the text in any place where text can appear in a ggplot. This is done via specifying element_markdown() instead of element_text() in the theme() function in your ggplot2 code. Synchronizing with the {glue} package you can refer to values inside columns of your dataframe and style them in a very easy way.

{ggtext} also allows you to insert images into the axes by supplying the HTML tag for the picture into the label.

In conclusion, Claus mentioned that another package of his, {gridtext}, does a lot of the heavy lifting to render formatted text in the ‘grid’ graphics system. He also warned everybody to not get too carried away with the awesome new functionalities to ggplot that {ggtext} introduces!

Next, Dana Seidel presented on version 1.1.0 of the {scales} package which provides the internal scaling infrastructure for {ggplot2} (but also works with base R graphics as well). The {scales} package focuses on five key aspects of data scales: transformations, bounds & rescaling, breaks, labels, and palettes.

Some of the notable changes in this new version include the renaming of functions for better consistency and to make it easier to tab complete functions in the package as well as providing more examples via the demo_*() functions.

While most of the conventional data transformations are included in the package such as arc-sin square root (atanh_trans()), Box-Cox (boxcox_trans()), exponential (exp_trans()), Pseudo-log ), etc. users can now define and build their own transformations using the trans_new() function!

Next, Dana introduced some rescaling functions:

One of the things that got the crowd really excited was the scales::show_col() function which allows you to check out colors inside a palette. This function is well-known to a lot of {ggplot2} color palette package creators (including myself for {tvthemes}) to showcase all the great new sets of colors we’ve made without having to draw up an example plot!

Last but certainly not least, Thomas Pedersen presented on extending {ggplot2} and really looking underneath the hood of a package most people can use, but has been quite a challenge to work with the internals.

This presentation got quite technical and a bit too advanced for me so I’ll refrain from doing a bad explanation on it here. However, it was still educational as developing my own {ggplot2} extension (creating actual new geoms/stats rather than just using existing {ggplot2} code) is something that I’ve been dying to do. There’s going to be a new chapter in the new and under development version 3 of the {ggplot2} book and more materials in the future on this so keep an eye out!

All slides

Lightning Talks

Mexican electoral quick count night with R

In this lightning talk, Maria Teresa Ortiz, talked about her {quickcountmx} package which is a package that provides functions to help estimate election results using Stan. As official counts from elections can take weeks to process this package was used to take a proportional stratified sample of polling stations to provide quick-count estimates (based on a Bayesian hierarchical model) for the 2018 Mexican presidential election.

As part of one of the teams on the committee who provided the official quick-count results, this package was created so that it would be easy to share code amongst the team members and to provide some transparency about the estimation process as well as to get feedback from other teams (as all the teams in the committee used R!).

Analyzing the `R Ladies` Twitter Account

Katherine Simeon talked about analyzing the R Ladies Official Twitter account using {tidytext}. The @WeAreRLadies account started in August 2018 and since then has accumulated over 15.5K followers and has had 56 different curators from 19 different countries. These curators rotate on a weekly basis and discuss via tweets their R experiences with the community on a variety of topics such as how they use R, tips, tricks, favorite resources, learning experiences, and other ways to engage with the #rstats/#RLadies community online. Katherine then took a look at the sentiments expressed in @WeAreRLadies tweets and made some nice ggplots to highlight certain emotions while also providing some commentary on some of the best tweets from the account.

It was very cool seeing the different curators and the different things the RLadies account talked about and Katherine provided a link to those who want to curate the account in the slides!

Tidyverse Developer Day

As this was the third Tidy Dev Day there were some improvements following lessons learnt from the first two iterations in Austin 2019 (my blog post on it here) and in Toulouse 2019. This time around post-it notes were stuck on a wall divided into groups of different {tidyverse} package and you had to take the post-its to “claim” the issue. After you PR’d the issue you were working on you can place your post-it under the “review” section of the wall and wait while a tidyverse dev looked at your changes. Once approved, they will move your sticker into the “merged” section. Once you have your post-it in the “merged” section you can SMASH the gong to raucous applause from everybody in the room, congratulations – you’ve contributed to a {tidyverse} package!

What I also learned from this event was using the pr_*() family of functions in the {usethis} package. I’m quite familiar with {usethis} as I use it at work often but until TidyDev Day I hadn’t used the newer Pull Request (PR) functions as I normally just did that manually on Github. The {usethis} workflow, which was also posted on the Tidy Dev Day README in more detail, was as follows:

Fork & clone repo with usethis::create_from_github("{username}/{repo}").
Make sure all dependencies are installed with devtools::install_dev_deps(), restart R, then devtools::check() to see if everything is running OK.
Create a new branch of the repo where you’ll make all your fixes with usethis::pr_init("very-brief-description")
Make your changes: Be sure to document, test, and check your code.
Commit your changes
Push & Pull Request from R via usethis::pr_push()

For naming new branches I normally like to provide ~3 words starting with an action verb (create/modify/edit/refactor/etc.) and then at the end add the Github issue number. Example: edit-count-docs-#2485.

I was working on some documentation improvements for {ggplot2}. Getting help and working next to both Claus Wilke and Thomas Pedersen was delightful as I use {cowplot} and {ggforce} extensively (not to mention {ggplot2}, obviously) but of course, I was very nervous at first!

Conclusion

Shoutout to some great presentations that I missed for a variety of reasons like:

I’m really looking forward to watching them on video soon!

On the evening/night of the first day of the conference was a special reception event at the California Academy of Sciences! After a long day everybody got together for food, drink, and looking at all the flora/fauna on exhibit. This was also where I met some of my fellow R Weekly editorial team members for the first time! I also think this might have been the first time that four editors were at the same place at the same time!

Outside of the conference I went to visit Alcatraz for the first time since I was a little kid along with first-time visitors, Mitch and Garth. That night I went to go see my favorite hockey team, the San Jose Sharks, play live at the Shark Tank for what feels like the first time in forever!

This was my third consecutive RStudio::Conf and I enjoyed it quite a lot. As the years have gone by I’ve talked to a lot of people on #rstats and it’s been a great opportunity to meet more and more of these people in real life at these kind of events. The next conference is in Orlando, Florida from January 18 to 21 and I’ve already got my tickets, so I hope to see more of you all there!

To leave a comment for the author, please follow the link and comment on their blog: R by R(yo).

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.