Site icon R-bloggers

76th Tokyo.R Users Meetup Roundup!

[This article was first published on R by R(yo), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The 76th Tokyo R User Meetup happened on March 2nd, graciously hosted by DeNA (an entertainment and e-commerce company) in their lovely headquarters located in Shibuya.

(Photo courtesy of Takashi Minoda)

On this day another R User Meetup was also happening up in Sapporo, Hokkaido. You can check them out here. Although this was the second Tokyo.R of 2019 I wasn’t able to attend the one in January as I was at the RStudio::Conf in Austin, Texas… a long way from home! Similar to my roundup blog post of the talks at Japan.R I will be going through around half of all the talks. Hopefully, my efforts will help spread the vast knowledge of Japanese R users to the wider R community. Throughout I will also post helpful blog posts and links from other sources if you are interested in learning more about the topic of a certain talk. You can follow Tokyo.R by searching for the #TokyoR hashtag on Twitter.

Unlike most R Meetups a lot of people present using just their Twitter handles so I’ll mostly be referring to them by those instead. I’ve been going to events here in Japan for about a year but even now sometimes I’m like, “Whoahh that’s what @very_recognizable_twitter_handle_in_the_japan_r_community actually looks like?!” Anyways…

Let’s get started!

Beginner Tutorials

Every Tokyo.R sessions starts off with three talks given by one of the organizing team members who go over some of the very basic aspects of R for beginner users. These talks are given by very experienced R users and are a way to let newbies feel comfortable before diving into real world applications of R in the main talks and LTs happening later on.

In this edition of Tokyo.R:

Talks

kato_kohaku: Model-Agnostic Explanations

In the first main talk of the day, @kato_kohaku dived deep into model-agnostic explanations using the DALEX, iml, and mlr packages. One of the problems seen in the ML field is the growing complexity of models as researchers have been able to push the limits of what they can do with increased computational power and the consequent discovery of new methods. The high performance of these complex models have come at a high cost with interpretability being reduced dramatically, with many of these newer models being called “black boxes” for that very reason. A model-agnostic method is preferable to model-specific methods mainly due to their flexibility, as typically data scientists evaluate many different types of ML models to solve a task. A model-agnostic method allows you to compare these types of models using the same method in a way that a model-specific method can’t.

@kato_kohaku went over the workflow for performing model-agnostic interpretation and covered partial dependence plots (PDP), individual conditional expectation (ICE), permutation importance, accumulated local effects plot (ALE), feature interaction, LIME, Shapley values, and more.

The topics he covered are well explained in Christoph Molnar’s excellent book, Interpretable Machine Learning which @kato_kohaku referred to through the presentation. There are a HUGE amount of slides (146 of them!) filled with a ton of great info that you can can read (a lot of the slides have explanations taken straight from the documentation in English) so I highly recommend taking a look through them if you are interested in what the DALEX and iml packages have to offer for interpreting models.

A great code-through explanation of using DALEX with mlr in English can be found here using the same data set as seen in @kato_kohaku's slides.

y_mattu: Operators/Objects in R

One of the organizers of Tokyo.R, @y_mattu, presented on objects in R. Specifically he went over using the pryr and lobstr packages to dig inside R objects and see what is happening “under the hood” of your everyday R operations.

“Every object in R is a function, every function in R is an object”

The above maxim means that even operators such as + can be turned into a function using parentheses to place all the arguments:

Looking deeper @y_mattu used the ast() function from the lobstr package to see the abstract syntax tree of the R expression that was shown above, 1 + 2.

library(lobstr)
lobstr::ast(1 + 2)

## o-`+` 
## +-1 
## \-2

The above shows the exact order in which the functions are being run by R. To now understand what is happening when we run this operation we need to look at the R environment. To check which environment holds the + () operator

library(pryr)

## 
## Attaching package: 'pryr'

## The following objects are masked from 'package:lobstr':
## 
##     ast, mem_used

pryr::where("+")

## <environment: base>

And we find that the base package holds this operator and it is called from the base environment. In the final part @y_mattu looked into the + operator itself by looking at the .Primitive() as well as pryr::show_c_source() to see the C source code used to make R be able to run +.

This was a very technical topic (for me) but it piqued my interest on what’s actually happening whenever you run a line of R code!

bob3bob3: DeNA

At every Tokyo.R the hosting company is given time to talk about their own company, how they use R, and hopefully provide some information for any interested job seekers. For DeNA, @bob3bob3 gave this talk and he provided us on some details on what exactly DeNA does as well as his own LT on SEM using lavaan. DeNA is a entertainment/e-commerce firm that is most well-known for it’s cellphone platform, Mobage. Interestingly, they also took ownership of MyAnimeList a few years back (probably one of the largest anime/manga database communities in the world). For job seekers he talked about the large variety of positions DeNA have available in the “Kaggler” category as well as open positions in the automobile, healthcare, sports analytics, HR analytics, marketing researcher departments, and more…!

Following his elevator pitch about DeNA he gave a small talk about using lavaan to plot out path analysis for structural equation modeling. @bob3bob3 explained how he ended up creating his own plotting function using the DiagrammeR and Graphviz packages to visualize the lavaan output as he did not like the default plotting method.

Lightning Talks

flaty13: Tidy Time-Series Analysis

@flaty13, who has also recently presented at Japan.R and SportsAnalyst Meetup on tennis analytics, gave a talk on analyzing time-series data with R. He first talked about how packages like lubridate and dplyr, while useful, may not be the best way to handle time series data. The solution @flaty13 talked about was the tsibble package created by Earo Wang. At RStudio::Conf 2019 Earo gave a talk on this package and using tidy data principles with time series data which you can watch here.

@flaty13 used his own pedometer data from a healthcare app on his iPhone for his demonstration. After reading the data in and performing the usual tidyverse operations on it, the data frame was turned into a tsibble object and then visualized as a calendar plot using the sugrrants package (also by Earo Wang).

saltcooky: Organizing a R Study Group at My Company!

@saltcooky took the time to talk to us about something that doesn’t usually get mentioned at Tokyo.R, as he reported about the success of an intra-company R workshop he hosted. At @saltcooky's company the majority of his co-workers are Pythonistas with only three other co-workers and him being R users. Hoping to change this dynamic, especially as their company does a lot of data analytics, @saltcooky set out to create some workshops. What he came up with were three separate sessions heavily inspired by the Tokyo.R method that I talked about in “Beginner Tutorial” section.

Throughout the workshops @saltcooky was asked some peculiar questions like “Is there a difference in using . vs. _ in separating words in a function/object name?” and “Why are there so many packages/functions with the same functionality!?”.

One of the major hurdles that @saltcooky faced was in installing R for all the different OSes that his co-workers used. The solution he came up with was to use RStudio Cloud. This eased the burden for him as he didn’t need to set up or manage any servers while the students did not need to install any software at all! There was actually a great talk on using “RStudio Cloud for Education” by Mel Gregory at RStudio::Conference 2019 a few months ago and it’s a great resource for others thinking about holding workshops.

@saltcooky concluded that his workshops were a mild success as he was able to get a couple more people using R casually at his workplace and although Python remains dominant he looks forward to convincing more people to use R in the future.

moratoriamuo271: Topic Modeling Cooking Recipes!

Continuing the theme of “tidy” data analysis, @moratoriamuo271 applied the concept to text analysis. The motivation for this talk came from the difficulty and hassle of figuring out a nice set of meals to eat over the course of a week. To solve this problem he sought to create a recommendation engine for recipes!

As seen in the above flowchart @moratoriamuo271:

  1. Web scraped recipes using rvest
  2. Created some word-clouds for some EDA
  3. Used the RMeCab and tm to create an organized document term matrix (RMeCab is a package specifically for Japanese text analysis)
  4. Latent Dirichlet Analysis with topicmodels and ldatuning packages
  5. Finally, splitting recipes into categories with tidytext

Before he showed us the results of his work, @moratoriamuo271 took us through a crash course on various topic modeling techniques from the basic uni-gram model, to mixture of uni-gram models, and finally on Latent Dirichlet Analysis (LDA).

He also went over the process in which he decided on the optimal number of topics for his recommendation engine. This was done by looking at the perplexity values from the ldatuning package. Here is a great blog post by Peter Ellis on using cross-validation on perplexity to determine the optimal number of topics. Below is the final finished product that gives you recipes for nutritious and balanced meals for seven dinners!

@moratoriamuo271 has also released a blog post with ALL the code that you can check out here!

Other Talks

I couldn’t go through all of the talks but I will provide their slides below (if/when they become available)

Conclusion

After the talks, everyone got together for a little after-party over food and drinks. Usually pizza is served but this time was a bit more fancy with kara-age and cheese-on-crackers being served. As the night wore on R users from all over Tokyo talked about their successes and struggles with R.

Unfortunately, there is only so much I can do to translate the talks, especially as Tokyo.R doesn’t do recordings anymore, but I hope that I could be of some help and maybe you’ll be inspired by a code snippet there or a certain package name elsewhere, etc.! Tokyo.R happens almost monthly and it’s a great way to mingle with Japanese R users as it is the largest regular meetup here in Japan. Talks in English are also welcome so if you’re ever in Tokyo come join us!

To leave a comment for the author, please follow the link and comment on their blog: R by R(yo).

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.