Feeling the rstudio::conf ❤️
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I am heading home from my third year of attending rstudio::conf! If you weren’t there, watch for the videos to be released so you can check out the talks; I know I will do the same so I can see the talks I was forced to miss by scheduling constraints. I love this conference, and once again this year, the organizers have succeeded in building an impactful, valuable, inclusive conference. The welcoming values of the conference were made explicit from the very first moments of the first session.
First few things covered at #rstudioconf:
— Angela Li (@CivicAngela) January 17, 2019
-Code of Conduct ????
-How to find friendly @rstudio staff to help you out
-Pac-Man rule: stand in a way so that someone can join your conversation (a circle with a hole, not a closed circle!)
This was my first year not to speak at rstudio::conf, and I so enjoyed attending more talks, focusing on learning, and meeting so many people. The only “official” thing I did was faciliate a Birds of a Feather session focused on text mining and natural language processing. Thanks to everyone who came to chat there!
As I reflect more broadly on the conference, it’s interesting to see themes emerge that multiple folks addressed across the days. For example, one theme that I remember from last year in 2018 was how broadly useful and impactful simulations can be. This year, I feel like there are three main themes I connected with.
- Using R as a first-class programming language, often in production
- Best practices for workflows
- Professional competence beyond writing code
R is a real programming language
On Thursday, we heard from RStudio president Tareef Kawaf about code, reproducibility, and RStudio’s business model. He also talked about the two things that have made the most noticeable improvements in the quality of my own work life recently (both of which have been RStudio investments/efforts):
- maturing the R/database interfaces
- API capabilities for R
These are examples of this first theme, treating R like a first-class programming language and thinking about how it can be used in production. CTO Joe Cheng gave a great talk specifically about Shiny in production.
Next, at #rstudioconf we hear from @jcheng about putting Shiny apps into production.
— Julia Silge (@juliasilge) January 17, 2019
2015: “It's possible”
2019: “It's quite easy”
Only recently have
????automated testing
????load testing
????profiling
????deployment
been addressed.
Jacqueline Nolis and Heather Nolis gave another compelling and useful talk about R in production, telling their story about putting a deep learning model trained in R actually into production for T-Mobile using R.
????R & Plumber are close to parity with Python & Flask
— Julia Silge (@juliasilge) January 17, 2019
????R is great for quick data exploration
????Shiny was a game changer for communicating to stakeholders
????Language was never the project failpoint (security, container size, etc was)#rstudioconf
Some other standout talks for me around this theme were James Blair talking about Plumber APIs and Jim Hester’s talk about package dependencies. (Check out that new package Jim introduces in that talk for measuring how/how much your own package depends on others! ????)
How do I do this again?!
Another theme I saw was thinking explicitly about processes and how we work as data scientists and analysts. Probably my favorite in this category was Kara Woo’s talk about her experience working as an intern on ggplot2 and how she solved a specific problem with how box plots rendered.
Here at #rstudioconf @kara_woo tells us about ⬛️BOX PLOTS⬛️ (but really about debugging and problem solving).
— Julia Silge (@juliasilge) January 17, 2019
Questions for debugging strategy:
????How do I know what the bug is?
????How do I fix it?
????How do I know when I'm done? pic.twitter.com/siOREoimGj
I also am looking forward to applying Amelia McNamara’s talk about moving from brittle, fragile code to robust, safe code for categorical data, as well as the whole afternoon session on machine learning and modeling with talks by Max Kuhn, Alex Hayes, and more.
A different aspect of this theme of process and workflow is how we can make work public and share knowledge. My coauthor Dave’s keynote about sharing work publicly made the case for increasing the impact of work we do by broadening its audience.
It's the last talk of #rstudioconf! @drob talks about the unreasonable effectiveness of sharing your work publicly via
— Julia Silge (@juliasilge) January 18, 2019
????blog
????tweet
????open source
????giving talks
????writing a book pic.twitter.com/r35lpYB0EN
Code is only part of my job
Friday morning started off with a keynote by Felienne (who apparently is a one name person like Prince?!? after the talk I feel like it’s fair), and RStudio is so lucky to have brought her to their conference. Her main point was about how we teach programming to people, but the talk was excellent at a higher level than even that. I have watched some videos of her before so I was expecting this to be good, but it was better than I was even hoping.
Programming doesn’t have a tradition of direct instruction, and we have… really weird ideas about what to say to learners. ???? #rstudioconf https://t.co/8MZwTykWjg
— Julia Silge (@juliasilge) January 18, 2019
The third main theme I connected with from this conference was about how much of what we do as data scientists or analysts is not code, but teaching (as Felienne addressed), growing teams, and more. Hilary Parker gave a talk about using data more effectively by broadening how we think about our focus, beyond strictly technical strengths to include more collaborative and design skills.
BIGGER PICTURE, we can think more holistically and creatively about the system to
— Julia Silge (@juliasilge) January 18, 2019
????include data collection
????enable new lines of investigation
Develop design thinking abilities, alongside other data science skills#rstudioconf pic.twitter.com/4JjCv4pJH9
Other excellent talks in this category that I saw were JD Long’s talk about spreadsheets and bullshit (that’s what his shirt said, and he should probably start selling them), along with empathy and broadening the tent. Others are Caitlin Hudon’s talk about different kinds of mistakes she’s made and Angela Bassa’s talk about building data science teams.
The final session of the conference was a panel discussion focusing on data in organizations. I have a tweet thread where I jotted down perspectives the participants shared, from thoughts on junior data scientists, management vs. individual contributor work, team values, and data ethics.
Final #Rstudioconf panel making points we haven’t talked about enough yet: slow down, invest in leadership as skill development, and HIRE DIVERSE TEAMS
— Brooke Watson (@brookLYNevery1) January 18, 2019
Feat. @AngeBassa, @hspter, @_inundata, & @tracykteal, moderated by Eduardo Ariño de la Rubia pic.twitter.com/cxCh8gSo7u
BONUS ROUND
On Saturday, I participated in the first ever Tidyverse Developer Day. The goal of this event was to nurture regular contributors to tidyverse packages, especially from people who have not done so before. I maintain an R package that is tidyverse adjacent myself and have managed issues and PRs as a maintainer, but I had never written code for a core tidyverse package before or submitted a PR to a package with that kind of enormous user base. This developer day was a really valuable experience. First off, it was so, so fun to sit in a room with engaged, delightful people excited about what they are working on, chatting together while looking at issues, thinking about what improvements could help users like us. Second, I saw multiple people around me submit their first ever PRs, which is no small feat. I am (mostly) competent in package development and git so got to chat through some problems people had, but naturally the real experts were around as well.
My role at Tidyverse Developer Day (which, to be clear, I found very gratifying!)
— Jenny Bryan (@JennyBryan) January 20, 2019
*art credit to @thomasp85, I think? pic.twitter.com/I206riVIjP
I submitted three PRs during the day. Two of them were extremely tiny; the one I am happiest about is improving the error message in tidyr when you use spread()
and end up with duplicate identifiers. That’s something that I have experienced lots of times in my real world data life and lost time to puzzling over; the new error message is more clear, aligns with the tidyverse error message style guide, and gives guidance on what to try next.
I so enjoyed rstudio::conf this year. One thing I noticed this year is that both the speakers and the attendees exhibited some of the best representation by women I have ever experienced in a technical community. I hope to continue to see improving representation from other groups who are often under-indexed in data science and tech. I’ll look forward to putting what I learned into practice in both my day job and open source work, and hopefully to be back next year!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.