Connect R with Myrrix – Mahout & Cloudera’s real-time, scalable recommender system
[This article was first published on BNOSAC - Belgium Network of Open Source Analytical Consultants, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Myrrix is probably more known by java developers and users of Mahout than R users. This is because most of the times java and R developers live in a different community.
If you go to the website of Myrrix (http://myrrix.com), you'll find out that it is a large-scale recommender system which is able to build a recommendation model based on Alternating Least Squares. That technique is a pretty good benchmark model if you tune it well enough to get recommendations to your customers.
It has a setup which allows you to build recommendation models with local data and a setup to build a recommender system based on data in Hadoop – be it on CDH or on another Hadoop stack like HDInsights or your own installation.
Very recently, Cloudera has shown the intention to incorporate Myrrix into it's product offering (see this press release) and this is getting quite some attention.
Recommendation engines are one of the techniques in machine learning which get frequent attention although they are not so frequently used as other statistical techniques like classification or regression.
This is because a recommendation engines most of the time require a lot of processing like deciding on which data to use, handling time-based information, handling new products and products which are no longer sold, making sure the model is up-to-date andsoforth.
When setting up a recommendation engine, business users also want to compare their behaviour to other business-driven or other data-driven logic. In these initial phases of a project, allowing statisticians and data scientists to use their language of choice to communicate with, test and evaluate the recommendation engine is key.
To allow this, we have created an interface between R and Myrrix, containing 2 packages which are currently available on github (https://github.com/jwijffels/Myrrix-R-interface). It allows R users to build, finetune and evaluate the recommendation engine as well as retrieve recommendations. Future users of Cloudera might as well be interested in this, once Myrrix gets incorporated into their product offering.
Myrrix deploys a recommender engine technique based on large, sparse matrix factorization. From input data, it learns a small number of “features” that best explain users' and items' observed interactions. This same basic idea goes by many names in machine learning, like principal component analysist or latent factor analysis. Myrrix uses a modified version of an Alternating Least Squares algorithm to factor matrices. More information can be found here: http://www.slideshare.net/srowen/big-practical-recommendations-with-alternating-least-squares and at the Myrrix website.
So if you are interested in setting up a recommendation engine for your application or if you want to improve your existing recommendation toolkit, contact us.
If you are an R user and only interested in the code on how to build a recommendation model and retrieve recommendations, here it is.
# To start up building recommendation engines, install the R packages Myrrixjars and Myrrix as follows. install.packages("devtools") install.packages("rJava") install.packages("ffbase") library(devtools) install_github("Myrrix-R-interface", "jwijffels", subdir="/Myrrixjars/pkg") install_github("Myrrix-R-interface", "jwijffels", subdir="/Myrrix/pkg") ## The following example shows the basic usage on how to use Myrrix to build a local recommendation ## engine. It uses the audioscrobbler data available on the Myrrix website. library(Myrrix)## Download example dataset inputfile <- file.path(tempdir(), "audioscrobbler-data.subset.csv.gz") download.file(url="http://dom2bevkhhre1.cloudfront.net/audioscrobbler-data.subset.csv.gz", destfile = inputfile) ## Set hyperparameters setMyrrixHyperParameters(params=list(model.iterations.max = 2, model.features=10, model.als.lambda=0.1)) x <- getMyrrixHyperParameters(parameters=c("model.iterations.max","model.features","model.als.lambda")) str(x) ## Build a model which will be stored in getwd() and ingest the data file into it recommendationengine <- new("ServerRecommender", localInputDir=getwd()) ingest(recommendationengine, inputfile) await(recommendationengine) ## Get all users/items and score alongside the recommendation model items <- getAllItemIDs(recommendationengine) users <- getAllUserIDs(recommendationengine) estimatePreference(recommendationengine, userID=users[1], itemIDs=items[1:20]) estimatePreference(recommendationengine, userID=users[10], itemIDs=items) mostPopularItems(recommendationengine, howMany=10L) recommend(recommendationengine, userID=users[5], howMany=10L)
To leave a comment for the author, please follow the link and comment on their blog: BNOSAC - Belgium Network of Open Source Analytical Consultants.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.