Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
It is the day after rstudio::conf(2022), and I’m sitting in the lobby of the ridiculously oversized conference hotel, trying to collect my thoughts and impressions. It was a densely packed two days of presentation and events (plus the earlier two days of workshops). Lots of new inspirations, ideas, impressions; all of which takes a little time to digest.
And of course there was the community experience, after two years of pandemic. The previous rstudio::conf(2020) in San Francisco was my last in-person conference before Covid hit. And even though Covid is far from over, I have great trust in the R community and made this my first in-person conference in more than 2 years. I believe that the community is R’s strongest asset. And it was a great joy to see many familiar faces again, and meet new friends. In an ever-changing data science and machine learning landscape, it is the community that will ensure the continued success of R as an integral part of the data scientists’ toolbox.
Much will be written about the big announcements of the week, so I’ll keep my reflections on those brief, and will add a few of my own lessons and impressions on other aspects.
RStudio rebrands as Posit to expand to other languages
I strongly believe that this is the right move. Programming languages are tools, and it is the best strategy to choose the right tool for each particular problem. Rather than being constrained by the limits of one specific language. R certainly has many limits, and so has every single language. But R also has impressive strengths, where its performance can lead to unbelievable results. Combining the strengths of R with the strengths of other languages is already a winning strategy. And it will be even more so in the future.
Specifically, R and Python have been developing closer ties since the days of rpy
(now there’s rpy2), and likely even earlier. The reticulate package is the latest and greatest of these initiatives from the R side. And Jupyter notebooks have been supporting R robustly and successfully since the project emerged from ipython. (Ju – py – teR = Julia – Python – R.) Now we have Quarto, as an extension of Rmarkdown to several other languages including Python.
Multilingual efforts have always been a part of the development philosophy. Both R and Python have well-established and reliable interfaces to C that speed up code. And a newer startup like huggingface, which has exploded onto the scene, uses Rust to boost the performance of their Python transformers library under the hood.
My recommendation is to learn tools, rather than languages.
As for the new name, I confess that I don’t feel strongly about “Posit” in any particular way. I believe it will grow on us, though, and will quickly develop connotations with its amazing community.
Everybody talks about Quarto
The second main theme at this conference has been the significant focus on Quarto, the new and (even more) multilingual successor of the Rmarkdown framework. I think the seeds for this development had been sown several years ago, and reticulate
had already made it possible to combine R and Python code chunks in the same document. But outside the R community, this development hadn’t seen much adoption. Jupyter notebooks remain the most common way for Python folks to design notebook-style content.
But while reticulate
was one specialised package among many (and it might have been more difficult to recognise its full potential), Quarto is well positioned to change the game. One early success is the collaboration of the nbdev and Quarto projects to create nbdev v2, the next version of the popular Python software development system built around notebooks. Here is a conversation between the Posit CEO J.J. Allaire and FastAI’s Jeremy Howard on the topic.
The Quarto gallery already contains a bunch of high-quality examples for books, websites, blogs, and more. In fact, I’ve been thinking about updating the style & presentation for this blog. Might be a good excuse to move it over to Quarto at the same time.
Workshop musings and network analysis
Just like the workshop that I took back at the 2020 rstudio::conf, this year’s one was well-designed and densely packed with useful information. And I’ve only ever heard good things about rstudio::conf workshops from various participants. The topics that are being offered usually cover a wide range of aspects and experience levels. This year, I decided to attend the R for People Analytics workshop.
My primary interest was to learn about network analysis, which was covered in detail on the second day of the 2-day workshop. The first day focused on a general introduction to people analytics methodology, along with different statistical and regression approaches. I had worked with these methods before, but it had been a while and it was nice to get a systematic refresher and some new perspectives. And I hadn’t really done things like ordinal regression or used the MASS package. So the entire workshop was useful, and I definitely got what I wanted out of the network analysis sections. Expect a related blog post pretty soon.
One interesting aspect of this workshop was that it was taught from a statistical perspective; versus the machine learning mindset that I’m more used to now through Kaggle. These statistical tools were what I started out with when learning more about R, and it was a helpful experience to encounter them again through the lens of a machine learning practitioner.
For instance, concepts like distribution tests (e.g. t-test), p-values, or information criteria rarely play a role in machine learning applications. Feature importances are often extracted, but models are usually evaluated on their validation performance on a particular metric. Maybe there could be a bit more emphasis on distribution tests during data preprocessing. But generally those differences in philosophy seem to reflect the different goals of building predictive vs explanatory models.
There are a couple other disconnected thoughts and reflections from rstudio::conf(2022) that are buzzing around in my head, and I might follow up some of them in a future post. Such as vetiver for MLOps, some spatial ideas, Shiny for Python, and more. But the ones above are the main takeaways for me. These kind of conferences always give me a boost in motivation and energy. I have several other blog entries planned out for the coming weeks. Watch this space 😉
For now, my closing thought is that next year’s posit::conf(2023) is already planned for Orlando, FL, on May 22-25. And I’m absolutely certain that between now and then, the R community will create and share a lot of cool stuff.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.