The R Factor

Owen Jones

3 years ago

[This article was first published on RBlog – Mango Solutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

To the uninitiated, entering UCL’s packed Darwin lecture theatre on Monday evening knowing you held a golden ticket so coveted that 400 names remained on a waitlist, you could be forgiven for thinking this was the most popular meetup on the planet

And perhaps, on this occasion at least, LondonR was the focal point of the R universe. Because after all, it’s not every day that one has the opportunity to attend an in-person presentation by the great Hadley Wickham.

If you’ve been living under a rock for the past ten years, Hadley Wickham is Chief Scientist at RStudio as well as an adjunct professor of statistics at the Universities of Auckland, Stanford and Rice University. In short, he builds tools (computational and cognivite) that make data science easier, faster and more fun.

As soon as the title slide: “Tidyverse: The Greatest Hits” was revealed, murmurs began to echo around the lecture theatre. And these murmurs increased in intensity when Hadley dismissed the title as somewhat misleading and instead promised to talk about the “biggest mistakes” that have been made since the tidyverse came into being.

The hushed predictions of many were confirmed when a giant ggplot2 sticker appeared on the screen and the presentation that followed was enlightening, entertaining and introspective in equal measure.

Mistakes Part I

Beginning with his eternal remorse over discovering the magrittr pipe only after the first major wave of ggplot2 uptake, and via a sheepish admittance that masking stats::filter() was perhaps overly callous, it wasn’t too long before we arrived at the topic of tidyeval.

This was where Hadley was able to expand for the first time on one of the core messages of his talk: borrowing a quote from software development coach, GeePaw Hill, he advocated the benefits of making as many mistakes as possible as quickly as possible and described how, over the course of several years, he had experienced several “false epiphanies” which ultimately resulted in the creation of lazyeval and subsequently the tidyeval we all know and tolerate today.

Of course I jest; tidyeval is incredibly powerful, and Hadley was unwavering in his conviction that it will be the source of much future progress in R development, highlighting among other uses its fundamental role in innovative interface packages such as dbplyr and dtplyr.

He went on to acknowledge, however, that the number of people who share his passion for the underlying theory of tidyeval is rather small and subsequently reflected on the decision to reveal it to the world while still in its relative infancy. Although initially disappointed by the slowness of the community to warm to the new concepts, he has come to terms with the fact that not everyone will immediately jump onto the quosure bandwagon. Consequently there have been efforts in recent times to increase the accessibility of tidyeval, and we were proudly shown one of the latest developments: the “interpolation”, or “curly-curly”, or (Jenny Bryan’s wonderful coinage) “embrace” operator {{_}}.

The trend towards user-friendliness and, in particular, self-explanatory functionality, is set to continue: we can look forward to the imminent release of tidyr 1.0.0, where the introduction of pivot_longer() and pivot_wider() is sure to delight those of us who never wrapped our heads around gather() and spread(), and to delight the rest of us, who DID eventually get to grips with them but still had to look up the syntax of both every time we wanted to use either.

But what about Python?

We couldn’t claim to have hosted a top R event if we hadn’t had some mention of Python from the audience. Hadley took the mandatory “R vs Python” question in his stride, perhaps unsurprisingly given the frequency with which he must face it.

In order to use Python, he argued, one must necessarily learn at least a small amount of programming – enough that someone coming from a purely data science perspective might be discouraged from continuing beyond the earliest stages of learning.

It’s possible to do useful data science work in R without learning any programming at all, and then as greater complexity is required, one can start to learn more about programming and about the language itself. Once someone has reached that point though, it is more a question of what is most suitable for the task at hand, in the context at hand. And here Hadley animatedly encouraged us to “use Python!” if that was the sensible option.

Mistakes Part II

This sense of unity was a common theme throughout the presentation. Approaching the conclusion, Hadley expressed some regret for, in his view, one of the largest mistakes of all: the decision to denominate a certain group of packages as “the tidyverse”.

The intention, he elaborated, was never to provide a complete-but-isolated paradigm. Putting aside our human tendency to see conflict where there are merely options, there is no “base vs tidyverse” turf war. A tidyverse package can be used, is designed to be used, in exactly the same way as any other R package, ie: in whichever context works best, with whatever other packages work best.

Hadley cited specific examples such as the effective combination of data.table and ggplot2, praising the utility and speed of the former in conjunction with the visualisation power of the latter. The name “tidyverse” is a blessing and a curse, he concluded. Powerful as a label for the concepts it represents, but overly evocative of completeness and correctness.

Love for the R Community

In response to a question about how the R community has developed over the years, Hadley described how, at every stage of the community’s slow transition from the original R-Help mailing list, through StackOverflow, and most recently to Twitter and the RStudio Community forums, “asking for help” has gradually become a much easier thing to do.

The openness and friendliness of the community is one of the major strengths of R, and Hadley was quick to praise the community at large, giving RLadies a special mention for the work they have been doing in recent years.

After concluding with some enticing hints about where his efforts might be focused in the near future (look out for new and improved vctrs, maybe…) our time was up and Hadley had to leave for his next engagement, but not before hinting at the possibility of another visit when his “travel budget” allows. But before you rush to put your name down for the next event, breathe steady folks, he’s a busy man and it might be a little while before he’s back over here on the wrong side of the world.

It was refreshing to hear someone like Hadley acknowledge that innovation isn’t a straight line and that forking and dead ends are essential parts of the process. Speaking to attendees afterwards this message was highly prized and it felt as though there was an increased confidence with many attendees to go out and try things without the fear of failure.

For those of you who were not lucky enough to get a golden ticket this time, don’t worry, all is not lost. You can see a recording of Hadley’s presentation here.

And if this inspires to you to find out more about the R community, rest assured that spaces aren’t usually quite so keenly fought over. We’re a friendly lot and you’ll probably find you even get a seat at the next event!

To leave a comment for the author, please follow the link and comment on their blog: RBlog – Mango Solutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.