Fastai Collaborative Filtering with R and Reticulate
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Jeremy Howard and Rachel Thomas are founders of fast.ai whose aim is to make deep learning accessible to all. They offer a course called Practical Deep Learning for Coders (Part 1). The last session, taught by Jeremy, was in Fall 2017 and the videos were released early January 2018. Their approach is top down by showing different applications first as black boxes followed by progressive peeling of the black box to teach the details of how things work. The course uses python and they have developed a python library fastai that is a wrapper around PyTorch.
I wanted to learn reticulate by trying to create a R version of one of the python notebooks from that class. The class covers the topic of collaborative filtering in lecture 5 and lecture 6. The dataset used is a sample of movielens dataset where about ~670 users have rated ~9000 movies. The objective is to develop a model to predict the rating that a user will give for a particular movie.
The Jupyter notebook for this topic is divided into 2 portions:
- In the first half, the model is developed using just high level fastai functions. The R notebook for the first half is located here.
- In the second half, the model is developed from scratch and 3 different types of models are discussed going from matrix factorization type model to deep learning type models. The R notebook for the second half is located here.
Since the first half involved mainly python functions from fastai library, it seemed like a good use case for reticulate since we could use reticulate just for model development and use R functions for other pre and post processing tasks. The second half involved model building from scratch. In pyTorch, custom models need to be written as python classes. While it was still possible to use reticulate in this case, this may not be the ideal use case since it might be better for somebody developing custom models to do the whole work in python. But once they wrap it into a python package, it is easier to use from R. Overall, reticulate was great to work with and it made it very easy to translate a python function to an equivalent R function. It is a great addition to the R packages.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.