May the XAI be with you!
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
New version of the ingredients package (v 0.5) was released on CRAN few days ago. Below I will show largest changes made in this version.
The ingredients package is a part of DrWhy family of tools for model exploration and explanation. To make this presentation more entertaining it will be based on the Star Wars data from kaggle.
What makes a Jedi?
Here is the script that reads the kaggle data, performs some data cleaning and trains a gbm model that learns if someone is using a lightsaber (Jedi or Sith) or not. We end up with a simple binary classification.
Let’s see what can we achieve using the ingredients package. We have data only for 87 characters and the model is overfitted (AUC=0.95) so do not take these results too seriously ;-).
New Feature Importance
The largest change is that now the feature_importance() function calculates B (by default 10) versions of feature importances based on B random data permutations. Thanks to this, functions plot() and plotD3() show not only the average feature importance but also boxplots with distributions from particular random permutations. For smaller datasets this also means more stable results. The side effect is that feature_importance() with default parameters will be 10x slower. For larger datasets it is recommended to use only a subset of observations.
Feature importance plot for the gbm model is presented below. It turns out that the four most important features for our model are: colour of skin, eye and hair along with height.
New smoother Accumulated Dependency
The accumulated_dependency() function now uses Gaussian kernel smoothing to locally adapt for potential correlation between features. It is based on algorithm introduced in ALEPlot but with weighted averages. As a result it’s smoother. It is also working for categorical variables. A side effect of these changes is that output from accumulated_dependency() and partial_dependency() needs to be sorted.
Accumulated dependency plot for gbm Star Wars model and for the height variable is presented below. It looks like higher characters have higher chances to use a lightsaber (with exception to Joda).
Aspect importance is moved to DALEXtra
Previous version of ingredients had an implementation of the LIME algorithms called aspect importance. Due to heavier dependencies this algorithm is moved to the DALEXtra package.
Interactive exploration with modelStudio
Use the modelStudio browser https://pbiecek.github.io/explainStarWars/ if you want to play more with the gbm model trained on Star Wars characters.
You can easily make such Model Studio for your own model with the modelStudio package.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.