The Good, the Bad and the Ugly: how to visualize Machine Learning data
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Below you’ll find the complete code and resources used to create the graphs in my talk The Good, the Bad and the Ugly: how to visualize Machine Learning data at this year’s Minds Mastering machines conference. You can find the German slides here:
You can find Part 1: The Good, the Bad and the Ugly: how (not) to visualize data here.
If you have questions or would like to talk about this article (or something else data-related), you can now book 15-minute timeslots with me (it’s free – one slot available per weekday):
 
If you have been enjoying my content and would like to help me be able to create more, please consider sending me a donation at . Thank you! 🙂
Libraries
library(tidyverse) library(mlbench) library(ggfortify) library(GGally) library(scagnostics) library(mlr)
Dataset
Pima Indians Diabetes dataset from mlbench package.
data(PimaIndiansDiabetes) PimaIndiansDiabetes %>% head() ## pregnant glucose pressure triceps insulin mass pedigree age diabetes ## 1 6 148 72 35 0 33.6 0.627 50 pos ## 2 1 85 66 29 0 26.6 0.351 31 neg ## 3 8 183 64 0 0 23.3 0.672 32 pos ## 4 1 89 66 23 94 28.1 0.167 21 neg ## 5 0 137 40 35 168 43.1 2.288 33 pos ## 6 5 116 74 0 0 25.6 0.201 30 neg
Colors
# The palette with grey:
cbp1 <- c("#999999", "#E69F00", "#56B4E9", "#009E73",
          "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
ggplot <- function(...) ggplot2::ggplot(...) + 
  scale_color_manual(values = cbp1) +
  scale_fill_manual(values = cbp1) + # note: needs to be overridden when using continuous color scales
  theme_bw()
Visualizing Machine Learning models
Visualizing different steps of the machine learning pipeline can help us
- explore the data (EDA),
- understand the data (and identify potential problems),
- pre-process the data in a suitable way for optimal model performance,
- supervise the learning process,
- optimize modeling,
- interpret the model and
- compare and evaluate model predictions.
Visualization also greatly simplifies communication of our model and results to decision-makers or the public.
Exploratory Data Analysis
Exploratory Data Analysis (EDA) is the backbone of data analysis, including those that result in a machine learning model. EDA helps us to understand the data we are working with and put it into context, so that we are able to ask the right questions (or to put our questions into the right frame). It also helps us take appropriate measures for cleaning, normalization/transformation, dealing with missing values, feature preparation and engineering, etc. Particularly if our machine learning model is trained on a limited dataset (but not only then!), appropriate data preparation can vastly improve the machine learning process: models will often train faster and achieve higher accuracy.
An essential part of EDA is data visualization.
Typically, we want to start by exploring potential sources of errors in our data, like
- wrong/useless data types (sometimes data types are automatically set in a way that is not useful for our analysis, like factors versus strings, or wrong/strange entries in an otherwise numeric column will make it categorical)
- missing values (a collection of ways to visualize missingness can be found here),
- outliers (for example by plotting a box-plot of continuous variables)
Depending on the number of features/variables we have, it makes sense to look at them all individually and in correlation with each other. Depending on whether we have a categorical or continuous variable, we might be interested in properties that are shown by
- histograms (frequency distribution of binned continuous variables),
- density distribution (normalized distribution of continuous variables) or
- bar-plots (shows counts of categorical variables).
If our target variable is categorical, we will want to look at potential imbalances between the classes. Class imbalance will strongly affect the machine learning modeling process and will require us to consider up-/downsampling or similar techniques before we train a model.
Correlation analysis can show us, for example
- how our target/dependent variable correlates with the remaining features (often, just by looking at the correlation, we can identify one ore more feature that will have a strong impact on predicting the target because they are strongly correlated) or
- whether some of the independent variables/features correlate with each other (multicolinearity; we might want to consider removing strongly correlated features, so that they won’t contribute the “same” information multiple times to the model and thus lead to overfitting).
Additional methods can be used to visualize groups of related features. These methods are often especially useful if we have a large dataset with a large feature set (highly dimensional data). Some of these methods for visualizing groups of related features and/or for comparing multiple variables and visualizing their relationships are:
- Dimensionality reduction:
- Principal Component Analysis (PCA, linear, shows as much variation in data as possible)
- Multidimensional scaling (MDS, non-linear)
- Sammon mapping (non-linear)
- T-Distributed Stochastic Neighbor Embedding (t-SNE, non-linear)
- Uniform Manifold Approximation and Projection (UMAP, non-linear, faster than T-SNE, often captures global variation better than T-SNE and PCA)
- Isometric Feature Mapping Ordination (Isomap)
 
- Parallel coordinate plots
- scagnostics
# in our dataset,
# continuous variables are
PimaIndiansDiabetes %>%
  dplyr::select(where(is.numeric)) %>%
  head()
##   pregnant glucose pressure triceps insulin mass pedigree age
## 1        6     148       72      35       0 33.6    0.627  50
## 2        1      85       66      29       0 26.6    0.351  31
## 3        8     183       64       0       0 23.3    0.672  32
## 4        1      89       66      23      94 28.1    0.167  21
## 5        0     137       40      35     168 43.1    2.288  33
## 6        5     116       74       0       0 25.6    0.201  30
# 'diabetes' is the only categorical variable is also our target or dependent variable
PimaIndiansDiabetes %>%
  dplyr::select(!where(is.numeric)) %>%
  head()
##   diabetes
## 1      pos
## 2      neg
## 3      pos
## 4      neg
## 5      pos
## 6      neg
# bar plot of target
PimaIndiansDiabetes %>%
  ggplot(aes(x = diabetes, fill = diabetes)) +
    geom_bar(alpha = 0.8) +
    theme(legend.position = "none") +
    labs(x = "Diabetes outcome", 
         y = "count",
        title = "Barplot of categorical features", 
        caption = "Source: Pima Indians Diabetes Database")

# boxplot of continuous features
PimaIndiansDiabetes %>%
  gather("key", "value", pregnant:age) %>%
  ggplot(aes(x = value, fill = diabetes)) +
    facet_wrap(vars(key), ncol = 3, scales = "free") +
    geom_boxplot(alpha = 0.8) +
    theme(axis.text.y = element_blank(),
          axis.ticks.y = element_blank())

# histogram of features
PimaIndiansDiabetes %>%
  gather("key", "value", pregnant:age) %>%
  ggplot(aes(x = value, fill = diabetes)) +
    facet_wrap(vars(key), ncol = 3, scales = "free") +
    geom_histogram(alpha = 0.8) +
    labs(x = "value of feature in facet", 
         y = "count",
         fill = "Diabetes",
        title = "Histogram of features", 
        caption = "Source: Pima Indians Diabetes Database")

# density plot of of features
PimaIndiansDiabetes %>%
  gather("key", "value", pregnant:age) %>%
  ggplot(aes(x = value, fill = diabetes)) +
    facet_wrap(vars(key), ncol = 3, scales = "free") +
    geom_density(alpha = 0.8) +
    labs(x = "value of feature in facet", 
         y = "density",
         fill = "Diabetes",
        title = "Density of continuous features", 
        caption = "Source: Pima Indians Diabetes Database")

# correlation plot of features
mat <- PimaIndiansDiabetes %>%
  dplyr::select(where(is.numeric))
cormat <- round(cor(mat), 2)
cormat <- cormat %>%
  as_data_frame() %>%
  mutate(x = colnames(mat)) %>%
  gather(key = "y", value = "value", pregnant:age)
cormat %>%
    remove_missing() %>%
    arrange(x, y) %>%
    ggplot(aes(x = x, y = y, fill = value)) + 
    geom_tile() +
    scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
     midpoint = 0, limit = c(-1,1), space = "Lab", 
     name = "Pearson\nCorrelation") +
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
    coord_fixed() +
    labs(x = "feature", 
         y = "feature",
        title = "Correlation between features", 
        caption = "Source: Pima Indians Diabetes Database")

# scatterplot matrix
ggpairs(PimaIndiansDiabetes, 
        columns = c(1:8),
        alpha = 0.7) +
    labs(x = "feature", 
         y = "feature",
        title = "Scatterplot matrix", 
        caption = "Source: Pima Indians Diabetes Database")

# PCA
prep <- PimaIndiansDiabetes %>%
  dplyr::select(where(is.numeric))
pca <- prep %>%
  prcomp(scale. = TRUE)
autoplot(pca, 
                data = PimaIndiansDiabetes, 
                colour = 'diabetes',
                shape = 'diabetes',
                loadings = TRUE, 
                loadings.colour = 'blue',
                loadings.label = TRUE, 
                loadings.label.size = 3) +
      scale_color_manual(values = cbp1) +
  scale_fill_manual(values = cbp1) +
  theme_bw() +
    labs(title = "Principal Component Analysis (PCA)", 
        caption = "Source: Pima Indians Diabetes Database")

# MDS
d <- dist(prep) # euclidean distances between the rows
fit <- cmdscale(d,eig=TRUE, k=2) # k is the number of dim
fit$points %>%
  head()
##        [,1]       [,2]
## 1 -75.71465 -35.950783
## 2 -82.35827  28.908213
## 3 -74.63064 -67.906496
## 4  11.07742  34.898486
## 5  89.74379  -2.746937
## 6 -80.97792  -3.946887
# Sammon mapping
library(MASS)
sam <- sammon(dist(prep))
## Initial stress        : 0.03033
## stress after   0 iters: 0.03033
sam$points %>%
  head()
##        [,1]       [,2]
## 1 -75.71465 -35.950783
## 2 -82.35827  28.908213
## 3 -74.63064 -67.906496
## 4  11.07742  34.898486
## 5  89.74379  -2.746937
## 6 -80.97792  -3.946887
# parallel coordinate plots
ggparcoord(data = PimaIndiansDiabetes, 
           columns = c(1:8), 
           groupColumn = 9,
           scale = "robust",
           order = "skewness",
           alpha = 0.7)

# scagnostics scagnostics_dataset <- scagnostics(PimaIndiansDiabetes) # scagnostics grid scagnostics_grid_dataset <- scagnosticsGrid(scagnostics_dataset) # outliers scagnostics_o_dataset <- scagnosticsOutliers(scagnostics_dataset) scagnostics_o_dataset[scagnostics_o_dataset] ## pregnant * age ## TRUE outlier <- scagnostics_grid_dataset[scagnostics_o_dataset,] # scagnostics exemplars scagnostics_ex_dataset <- scagnosticsExemplars(scagnostics_dataset) scagnostics_ex_dataset[scagnostics_ex_dataset] ## pregnant * triceps mass * age triceps * diabetes ## TRUE TRUE TRUE exemplars <- scagnostics_grid_dataset[scagnostics_ex_dataset,]
Training a machine learning model
(using mlr package)
- create training and test set
set.seed(1000) train_index <- sample(1:nrow(PimaIndiansDiabetes), 0.8 * nrow(PimaIndiansDiabetes)) test_index <- setdiff(1:nrow(PimaIndiansDiabetes), train_index) train <- PimaIndiansDiabetes[train_index,] test <- PimaIndiansDiabetes[test_index,] list( train = summary(train), test = summary(test) ) ## $train ## pregnant glucose pressure triceps ## Min. : 0.000 Min. : 0.0 Min. : 0.00 Min. : 0.00 ## 1st Qu.: 1.000 1st Qu.:100.0 1st Qu.: 64.00 1st Qu.: 0.00 ## Median : 3.000 Median :119.0 Median : 72.00 Median :23.00 ## Mean : 3.894 Mean :123.1 Mean : 68.89 Mean :20.66 ## 3rd Qu.: 6.000 3rd Qu.:143.0 3rd Qu.: 80.00 3rd Qu.:32.75 ## Max. :17.000 Max. :199.0 Max. :114.00 Max. :99.00 ## insulin mass pedigree age diabetes ## Min. : 0.00 Min. : 0.00 Min. :0.0780 Min. :21.00 neg:386 ## 1st Qu.: 0.00 1st Qu.:27.10 1st Qu.:0.2442 1st Qu.:24.00 pos:228 ## Median : 36.50 Median :32.00 Median :0.3780 Median :29.00 ## Mean : 81.65 Mean :31.92 Mean :0.4742 Mean :33.42 ## 3rd Qu.:131.50 3rd Qu.:36.38 3rd Qu.:0.6355 3rd Qu.:41.00 ## Max. :846.00 Max. :59.40 Max. :2.4200 Max. :81.00 ## ## $test ## pregnant glucose pressure triceps ## Min. : 0.000 Min. : 0.0 Min. : 0.00 Min. : 0.00 ## 1st Qu.: 1.000 1st Qu.: 93.0 1st Qu.: 62.00 1st Qu.: 0.00 ## Median : 2.000 Median :108.0 Median : 72.00 Median :23.00 ## Mean : 3.649 Mean :112.3 Mean : 69.96 Mean :20.03 ## 3rd Qu.: 6.000 3rd Qu.:133.8 3rd Qu.: 79.50 3rd Qu.:32.00 ## Max. :14.000 Max. :197.0 Max. :122.00 Max. :56.00 ## insulin mass pedigree age diabetes ## Min. : 0.0 Min. : 0.00 Min. :0.0850 Min. :21.00 neg:114 ## 1st Qu.: 0.0 1st Qu.:27.80 1st Qu.:0.2395 1st Qu.:23.25 pos: 40 ## Median : 20.5 Median :32.40 Median :0.3380 Median :29.00 ## Mean : 72.4 Mean :32.29 Mean :0.4627 Mean :32.54 ## 3rd Qu.:100.0 3rd Qu.:36.88 3rd Qu.:0.6008 3rd Qu.:39.75 ## Max. :744.0 Max. :67.10 Max. :2.3290 Max. :67.00
- create classification task and learner
listLearners() %>%
  head()
##                 class                               name  short.name
## 1         classif.ada                       ada Boosting         ada
## 2  classif.adaboostm1                    ada Boosting M1  adaboostm1
## 3 classif.bartMachine Bayesian Additive Regression Trees bartmachine
## 4    classif.binomial                Binomial Regression    binomial
## 5    classif.boosting                    Adabag Boosting      adabag
## 6         classif.bst                  Gradient Boosting         bst
##        package
## 1    ada,rpart
## 2        RWeka
## 3  bartMachine
## 4        stats
## 5 adabag,rpart
## 6    bst,rpart
##                                                                                                                                                note
## 1                                                                                                  `xval` has been set to `0` by default for speed.
## 2                                                                                       NAs are directly passed to WEKA with `na.action = na.pass`.
## 3                                                               `use_missing_data` has been set to `TRUE` by default to allow missing data support.
## 4  Delegates to `glm` with freely choosable binomial link function via learner parameter `link`. We set 'model' to FALSE by default to save memory.
## 5                                                                                                  `xval` has been set to `0` by default for speed.
## 6 Renamed parameter `learner` to `Learner` due to nameclash with `setHyperPars`. Default changes: `Learner = "ls"`, `xval = 0`, and `maxdepth = 1`.
##      type installed numerics factors ordered missings weights  prob oneclass
## 1 classif     FALSE     TRUE    TRUE   FALSE    FALSE   FALSE  TRUE    FALSE
## 2 classif      TRUE     TRUE    TRUE   FALSE    FALSE   FALSE  TRUE    FALSE
## 3 classif     FALSE     TRUE    TRUE   FALSE     TRUE   FALSE  TRUE    FALSE
## 4 classif      TRUE     TRUE    TRUE   FALSE    FALSE    TRUE  TRUE    FALSE
## 5 classif     FALSE     TRUE    TRUE   FALSE     TRUE   FALSE  TRUE    FALSE
## 6 classif     FALSE     TRUE   FALSE   FALSE    FALSE   FALSE FALSE    FALSE
##   twoclass multiclass class.weights featimp oobpreds functionals
## 1     TRUE      FALSE         FALSE   FALSE    FALSE       FALSE
## 2     TRUE       TRUE         FALSE   FALSE    FALSE       FALSE
## 3     TRUE      FALSE         FALSE   FALSE    FALSE       FALSE
## 4     TRUE      FALSE         FALSE   FALSE    FALSE       FALSE
## 5     TRUE       TRUE         FALSE    TRUE    FALSE       FALSE
## 6     TRUE      FALSE         FALSE   FALSE    FALSE       FALSE
##   single.functional    se lcens rcens icens
## 1             FALSE FALSE FALSE FALSE FALSE
## 2             FALSE FALSE FALSE FALSE FALSE
## 3             FALSE FALSE FALSE FALSE FALSE
## 4             FALSE FALSE FALSE FALSE FALSE
## 5             FALSE FALSE FALSE FALSE FALSE
## 6             FALSE FALSE FALSE FALSE FALSE
(dt_task <- makeClassifTask(data = train, target = "diabetes"))
## Supervised task: train
## Type: classif
## Target: diabetes
## Observations: 614
## Features:
##    numerics     factors     ordered functionals 
##           8           0           0           0 
## Missings: FALSE
## Has weights: FALSE
## Has blocking: FALSE
## Has coordinates: FALSE
## Classes: 2
## neg pos 
## 386 228 
## Positive class: neg
(dt_prob <- makeLearner('classif.gbm', predict.type = "prob"))
## Learner classif.gbm from package gbm
## Type: classif
## Name: Gradient Boosting Machine; Short name: gbm
## Class: classif.gbm
## Properties: twoclass,multiclass,missings,numerics,factors,prob,weights,featimp
## Predict-Type: prob
## Hyperparameters: keep.data=FALSE
Feature Selection
library(FSelector)
listFilterMethods() %>%
  head()
##                           id   package                                     desc
## 1                 anova.test           ANOVA Test for binary and multiclass ...
## 2                        auc           AUC filter for binary classification ...
## 3                   carscore      care                               CAR scores
## 4      FSelector_chi.squared FSelector Chi-squared statistic of independence...
## 5       FSelector_gain.ratio FSelector Chi-squared statistic of independence...
## 6 FSelector_information.gain FSelector Entropy-based information gain betwee...
listFilterEnsembleMethods() %>%
  head()
##         id
## 1  E-Borda
## 2    E-max
## 3   E-mean
## 4 E-median
## 5    E-min
##                                                                                                     desc
## 1                  Borda ensemble filter. Takes the sum across all base filter methods for each feature.
## 2 Maximum ensemble filter. Takes the best maximum value across all base filter methods for each feature.
## 3                  Mean ensemble filter. Takes the mean across all base filter methods for each feature.
## 4              Median ensemble filter. Takes the median across all base filter methods for each feature.
## 5 Minimum ensemble filter. Takes the best minimum value across all base filter methods for each feature.
generateFilterValuesData(dt_task, method = "FSelector_information.gain") %>% 
  plotFilterValues() +
  theme_bw() +
    labs(x = "feature",
         y = "information gain",
         title = "Information gain of features in GBM",
         caption = "Source: Pima Indians Diabetes Database")

feat_imp_tpr <- generateFeatureImportanceData(task = dt_task, 
                              learner = dt_prob,
                              measure = tpr, 
                              interaction = FALSE)
## Distribution not specified, assuming bernoulli ...
feat_imp_tpr$res %>%
  gather() %>%
  ggplot(aes(x = reorder(key, value), y = value)) +
    geom_bar(stat = "identity") +
    labs(x = "feature",
         title = "True positive rate of features in GBM",
         subtitle = "calculated with permutation importance",
         caption = "Source: Pima Indians Diabetes Database")

feat_imp_auc <- generateFeatureImportanceData(task = dt_task, 
                              learner = dt_prob,
                              measure = auc, 
                              interaction = FALSE)
## Distribution not specified, assuming bernoulli ...
feat_imp_auc$res %>%
  gather() %>%
  ggplot(aes(x = reorder(key, value), y = value)) +
    geom_bar(stat = "identity") +
    labs(x = "feature",
         title = "Area under the curve of features in GBM",
         subtitle = "calculated with permutation importance",
         caption = "Source: Pima Indians Diabetes Database")

set.seed(1000) train <- dplyr::select(train, -pedigree, -pressure, -triceps) test <- dplyr::select(test, -pedigree, -pressure, -triceps) list(train = summary(train), test = summary(test)) ## $train ## pregnant glucose insulin mass ## Min. : 0.000 Min. : 0.0 Min. : 0.00 Min. : 0.00 ## 1st Qu.: 1.000 1st Qu.:100.0 1st Qu.: 0.00 1st Qu.:27.10 ## Median : 3.000 Median :119.0 Median : 36.50 Median :32.00 ## Mean : 3.894 Mean :123.1 Mean : 81.65 Mean :31.92 ## 3rd Qu.: 6.000 3rd Qu.:143.0 3rd Qu.:131.50 3rd Qu.:36.38 ## Max. :17.000 Max. :199.0 Max. :846.00 Max. :59.40 ## age diabetes ## Min. :21.00 neg:386 ## 1st Qu.:24.00 pos:228 ## Median :29.00 ## Mean :33.42 ## 3rd Qu.:41.00 ## Max. :81.00 ## ## $test ## pregnant glucose insulin mass ## Min. : 0.000 Min. : 0.0 Min. : 0.0 Min. : 0.00 ## 1st Qu.: 1.000 1st Qu.: 93.0 1st Qu.: 0.0 1st Qu.:27.80 ## Median : 2.000 Median :108.0 Median : 20.5 Median :32.40 ## Mean : 3.649 Mean :112.3 Mean : 72.4 Mean :32.29 ## 3rd Qu.: 6.000 3rd Qu.:133.8 3rd Qu.:100.0 3rd Qu.:36.88 ## Max. :14.000 Max. :197.0 Max. :744.0 Max. :67.10 ## age diabetes ## Min. :21.00 neg:114 ## 1st Qu.:23.25 pos: 40 ## Median :29.00 ## Mean :32.54 ## 3rd Qu.:39.75 ## Max. :67.00 (dt_task <- makeClassifTask(data = train, target = "diabetes")) ## Supervised task: train ## Type: classif ## Target: diabetes ## Observations: 614 ## Features: ## numerics factors ordered functionals ## 5 0 0 0 ## Missings: FALSE ## Has weights: FALSE ## Has blocking: FALSE ## Has coordinates: FALSE ## Classes: 2 ## neg pos ## 386 228 ## Positive class: neg
Hyperparameter Optimization
getParamSet("classif.gbm")
##                       Type len       Def
## distribution      discrete   - bernoulli
## n.trees            integer   -       100
## cv.folds           integer   -         0
## interaction.depth  integer   -         1
## n.minobsinnode     integer   -        10
## shrinkage          numeric   -       0.1
## bag.fraction       numeric   -       0.5
## train.fraction     numeric   -         1
## keep.data          logical   -      TRUE
## verbose            logical   -     FALSE
## n.cores            integer   -         1
##                                                     Constr Req Tunable Trafo
## distribution      gaussian,bernoulli,huberized,adaboost...   -    TRUE     -
## n.trees                                           1 to Inf   -    TRUE     -
## cv.folds                                       -Inf to Inf   -    TRUE     -
## interaction.depth                                 1 to Inf   -    TRUE     -
## n.minobsinnode                                    1 to Inf   -    TRUE     -
## shrinkage                                         0 to Inf   -    TRUE     -
## bag.fraction                                        0 to 1   -    TRUE     -
## train.fraction                                      0 to 1   -    TRUE     -
## keep.data                                                -   -   FALSE     -
## verbose                                                  -   -   FALSE     -
## n.cores                                        -Inf to Inf   -   FALSE     -
dt_param <- makeParamSet( 
  makeIntegerParam("n.trees", lower = 20, upper = 150),
  makeNumericParam("shrinkage", lower = 0.01, upper = 0.1))
ctrl = makeTuneControlGrid()
rdesc = makeResampleDesc("CV", 
                         iters = 3L, 
                         stratify = TRUE)
set.seed(1000) 
(dt_tuneparam <- tuneParams(learner = dt_prob, 
                             resampling = rdesc, 
                             measures = list(tpr,auc, fnr, mmce, tnr, setAggregation(tpr, test.sd)), 
                             par.set = dt_param, 
                             control = ctrl, 
                             task = dt_task, 
                             show.info = FALSE))
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Distribution not specified, assuming bernoulli ...
## Tune result:
## Op. pars: n.trees=20; shrinkage=0.02
## tpr.test.mean=1.0000000,auc.test.mean=0.7878691,fnr.test.mean=0.0000000,mmce.test.mean=0.3713375,tnr.test.mean=0.0000000,tpr.test.sd=0.0000000
data = generateHyperParsEffectData(dt_tuneparam, 
                                   partial.dep = TRUE)
plotHyperParsEffect(data, x = "n.trees", y = "tpr.test.mean", partial.dep.learn = makeLearner("regr.gbm"))

plotHyperParsEffect(data, x = "shrinkage", y = "tpr.test.mean", partial.dep.learn = makeLearner("regr.gbm"))

plotHyperParsEffect(data, 
                    x = "n.trees", 
                    y = "shrinkage",
                    z = "tpr.test.mean", 
                    plot.type = "heatmap",
                    partial.dep.learn = makeLearner("regr.gbm")) +
  theme_bw() +
    labs(title = "Hyperparameter effects data",
         subtitle = "of GBM model with reduced feature set",
         caption = "Source: Pima Indians Diabetes Database")

list( `Optimal HyperParameters` = dt_tuneparam$x, 
      `Optimal Metrics` = dt_tuneparam$y )
## $`Optimal HyperParameters`
## $`Optimal HyperParameters`$n.trees
## [1] 20
## 
## $`Optimal HyperParameters`$shrinkage
## [1] 0.02
## 
## 
## $`Optimal Metrics`
##  tpr.test.mean  auc.test.mean  fnr.test.mean mmce.test.mean  tnr.test.mean 
##      1.0000000      0.7878691      0.0000000      0.3713375      0.0000000 
##    tpr.test.sd 
##      0.0000000
gbm_final <- setHyperPars(dt_prob, par.vals = dt_tuneparam$x)
set.seed(1000) 
gbm_final_train <- train(learner = gbm_final, task = dt_task) 
## Distribution not specified, assuming bernoulli ...
getLearnerModel(gbm_final_train)
## gbm::gbm(formula = f, data = d, n.trees = 20L, shrinkage = 0.02, 
##     keep.data = FALSE)
## A gradient boosted model with bernoulli loss function.
## 20 iterations were performed.
## There were 5 predictors of which 3 had non-zero influence.
Decision Trees
- Recursive Partitioning (rpart&rpart.plot)
library(rpart)
library(rpart.plot)
rpart_tree <- rpart(diabetes ~ .,
                    data = train,
                    method = "class")
rpart.plot(rpart_tree, 
           roundint=FALSE, 
           type = 3, 
           clip.right.labs = FALSE)

rpart.rules(rpart_tree, roundint = FALSE) ## diabetes ## 0.05 when glucose < 128 & mass < 27 & age >= 29 ## 0.10 when glucose < 128 & age < 29 ## 0.17 when glucose is 128 to 146 & mass < 30 ## 0.25 when glucose >= 146 & mass < 30 & age < 29 ## 0.28 when glucose < 128 & mass >= 29 & age >= 29 & insulin < 143 ## 0.38 when glucose is 128 to 158 & mass is 32 to 42 & age < 43 ## 0.62 when glucose >= 146 & mass < 30 & age >= 29 ## 0.63 when glucose < 128 & mass is 27 to 29 & age >= 29 & insulin < 143 ## 0.77 when glucose < 128 & mass >= 27 & age >= 29 & insulin >= 143 ## 0.82 when glucose is 128 to 158 & mass >= 42 & age < 43 ## 0.86 when glucose is 128 to 158 & mass >= 30 & age >= 43 ## 0.86 when glucose >= 158 & mass >= 30 ## 0.88 when glucose is 128 to 158 & mass is 30 to 32 & age < 43
Prediction
set.seed(1000) 
(gbm_final_predict <- predict(gbm_final_train, newdata = test))
## Prediction: 154 observations
## predict.type: prob
## threshold: neg=0.50,pos=0.50
## time: 0.00
##    truth  prob.pos  prob.neg response
## 12   pos 0.4807717 0.5192283      neg
## 18   pos 0.3229851 0.6770149      neg
## 19   neg 0.3229851 0.6770149      neg
## 20   pos 0.3300235 0.6699765      neg
## 34   neg 0.3091184 0.6908816      neg
## 38   pos 0.3229851 0.6770149      neg
## ... (#rows: 154, #cols: 4)
gbm_final_predict %>% calculateROCMeasures()
##      predicted
## true  neg       pos                        
##   neg 114       0        tpr: 1   fnr: 0   
##   pos 40        0        fpr: 1   tnr: 0   
##       ppv: 0.74 for: NaN lrp: 1   acc: 0.74
##       fdr: 0.26 npv: NaN lrm: NaN dor: NaN 
## 
## 
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio
model_performance <- performance(gbm_final_predict, 
                                 measures = list(tpr, auc, mmce, acc, tnr)) %>% 
  as.data.frame(row.names = c("True Positive Rate","Area Under Curve", "Mean Misclassification Error","Accuracy","True Negative Rate")) 
model_performance
##                                      .
## True Positive Rate           1.0000000
## Area Under Curve             0.7695175
## Mean Misclassification Error 0.2597403
## Accuracy                     0.7402597
## True Negative Rate           0.0000000
gbm_final_threshold <- generateThreshVsPerfData(gbm_final_predict, 
                                                 measures = list(tpr, auc, mmce, tnr))
gbm_final_threshold %>% 
   plotROCCurves() + 
   geom_point() +
    theme_bw() +
    labs(title = "ROC curve from predictions",
         subtitle = "of GBM model with reduced feature set",
         caption = "Source: Pima Indians Diabetes Database")

gbm_final_threshold %>% 
   plotThreshVsPerf() + 
   geom_point() +
    theme_bw() +
    labs(title = "Threshold vs. performance",
         subtitle = "for 2-class classification of GBM model with reduced feature set",
         caption = "Source: Pima Indians Diabetes Database")

gbm_final_threshold$data %>%
  head()
##   tpr       auc      mmce tnr  threshold
## 1   1 0.7695175 0.2597403   0 0.00000000
## 2   1 0.7695175 0.2597403   0 0.01010101
## 3   1 0.7695175 0.2597403   0 0.02020202
## 4   1 0.7695175 0.2597403   0 0.03030303
## 5   1 0.7695175 0.2597403   0 0.04040404
## 6   1 0.7695175 0.2597403   0 0.05050505
gbm_final_thr <- gbm_final_predict %>% 
  setThreshold(0.59595960) 
(dt_performance <- gbm_final_thr %>% performance(measures = list(tpr, auc, mmce, tnr)) )
##       tpr       auc      mmce       tnr 
## 0.8070175 0.7695175 0.2727273 0.5000000
(dt_cm <- gbm_final_thr %>% calculateROCMeasures() )
##      predicted
## true  neg       pos                          
##   neg 92        22        tpr: 0.81 fnr: 0.19
##   pos 20        20        fpr: 0.5  tnr: 0.5 
##       ppv: 0.82 for: 0.52 lrp: 1.61 acc: 0.73
##       fdr: 0.18 npv: 0.48 lrm: 0.39 dor: 4.18
## 
## 
## Abbreviations:
## tpr - True positive rate (Sensitivity, Recall)
## fpr - False positive rate (Fall-out)
## fnr - False negative rate (Miss rate)
## tnr - True negative rate (Specificity)
## ppv - Positive predictive value (Precision)
## for - False omission rate
## lrp - Positive likelihood ratio (LR+)
## fdr - False discovery rate
## npv - Negative predictive value
## acc - Accuracy
## lrm - Negative likelihood ratio (LR-)
## dor - Diagnostic odds ratio
performance_threshold <- performance(gbm_final_thr, measures = list(tpr, auc, mmce, acc, tnr)) %>% 
  as.data.frame(row.names = c("True Positive Rate", "Area Under Curve", "Mean Misclassification Error", "Accuracy", "True Negative Rate"))
performance_threshold
##                                      .
## True Positive Rate           0.8070175
## Area Under Curve             0.7695175
## Mean Misclassification Error 0.2727273
## Accuracy                     0.7272727
## True Negative Rate           0.5000000
Decision Boundaries
#remotes::install_github("grantmcdermott/parttree")
library(parsnip)
library(parttree)
set.seed(123) ## For consistent jitter
## Build our tree using parsnip (but with rpart as the model engine)
ti_tree =
  decision_tree() %>%
  set_engine("rpart") %>%
  set_mode("classification") %>%
  fit(diabetes ~ glucose + mass, data = PimaIndiansDiabetes)
## Plot the data and model partitions
PimaIndiansDiabetes %>%
  ggplot(aes(x = glucose, y = mass)) +
  geom_jitter(aes(col = diabetes), alpha = 0.7) +
  geom_parttree(data = ti_tree, aes(fill = diabetes), alpha = 0.1) +
  theme_bw() +
    labs(title = "Decision boundaries",
         subtitle = "for 2-class classification of RPART model (glucose + mass)",
         caption = "Source: Pima Indians Diabetes Database")

Time-series
Artificial Neural Networks (ANNs)
- Li et al, Visualizing the Loss Landscape of Neural Nets, 2018 
- Visualizing Data using the Embedding Projector in TensorBoard 
- Visualizing and Understanding Convolutional Networks, Zeiler & Fergus, 2013 
- Play with Generative Adversarial Networks (GANs) in your browser 
- Whose dream is this? When and how to use the Keras Functional API 
 - Deep Learning with Keras and TensorFlow & Update with TF 2.0: Image classification with Keras and TensorFlow
- Deep Learning with Keras and TensorFlow & Update with TF 2.0: Image classification with Keras and TensorFlow
 
 

Graphical representation of a model in TensorBoard
Word Embeddings
Explainable AI
# session info devtools::session_info() ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.4 (2021-02-15) ## os macOS Big Sur 10.16 ## system x86_64, darwin17.0 ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Europe/Berlin ## date 2021-04-27 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib ## assertthat 0.2.1 2019-03-21 [2] ## backports 1.2.1 2020-12-09 [2] ## BBmisc 1.11 2017-03-10 [2] ## blogdown 1.2 2021-03-04 [2] ## bookdown 0.21 2020-10-13 [2] ## broom 0.7.5 2021-02-19 [2] ## bslib 0.2.4 2021-01-25 [2] ## cachem 1.0.4 2021-02-13 [2] ## callr 3.5.1 2020-10-13 [2] ## cellranger 1.1.0 2016-07-27 [2] ## checkmate 2.0.0 2020-02-06 [2] ## cli 2.3.1 2021-02-23 [2] ## colorspace 2.0-0 2020-11-11 [2] ## crayon 1.4.1 2021-02-08 [2] ## data.table 1.14.0 2021-02-21 [2] ## DBI 1.1.1 2021-01-15 [2] ## dbplyr 2.1.0 2021-02-03 [2] ## desc 1.3.0 2021-03-05 [2] ## devtools 2.3.2 2020-09-18 [2] ## digest 0.6.27 2020-10-24 [2] ## dplyr * 1.0.5 2021-03-05 [2] ## ellipsis 0.3.1 2020-05-15 [2] ## entropy 1.2.1 2014-11-14 [1] ## evaluate 0.14 2019-05-28 [2] ## fansi 0.4.2 2021-01-15 [2] ## farver 2.1.0 2021-02-28 [2] ## fastmap 1.1.0 2021-01-25 [2] ## fastmatch 1.1-0 2017-01-28 [2] ## forcats * 0.5.1 2021-01-27 [2] ## fs 1.5.0 2020-07-31 [2] ## FSelector * 0.33 2021-02-16 [1] ## gbm 2.1.8 2020-07-15 [2] ## generics 0.1.0 2020-10-31 [2] ## GGally * 2.1.1 2021-03-08 [1] ## ggfortify * 0.4.11 2020-10-02 [2] ## ggplot2 * 3.3.3 2020-12-30 [2] ## glue 1.4.2 2020-08-27 [2] ## gridExtra 2.3 2017-09-09 [2] ## gtable 0.3.0 2019-03-25 [2] ## haven 2.3.1 2020-06-01 [2] ## highr 0.8 2019-03-20 [2] ## hms 1.0.0 2021-01-13 [2] ## htmltools 0.5.1.1 2021-01-22 [2] ## httr 1.4.2 2020-07-20 [2] ## jquerylib 0.1.3 2020-12-17 [2] ## jsonlite 1.7.2 2020-12-09 [2] ## knitr 1.31 2021-01-27 [2] ## labeling 0.4.2 2020-10-20 [2] ## lattice 0.20-41 2020-04-02 [2] ## lifecycle 1.0.0 2021-02-15 [2] ## lubridate 1.7.10 2021-02-26 [2] ## magrittr 2.0.1 2020-11-17 [2] ## MASS * 7.3-53.1 2021-02-12 [2] ## Matrix 1.3-2 2021-01-06 [2] ## memoise 2.0.0 2021-01-26 [2] ## mlbench * 2.1-3 2021-01-29 [1] ## mlr * 2.19.0 2021-02-22 [2] ## mmpf * 0.0.5 2018-10-24 [2] ## modelr 0.1.8 2020-05-19 [2] ## munsell 0.5.0 2018-06-12 [2] ## parallelMap 1.5.0 2020-03-26 [2] ## ParamHelpers * 1.14 2020-03-24 [2] ## parsnip * 0.1.5 2021-01-19 [2] ## parttree * 0.0.1.9000 2021-03-14 [1] ## pillar 1.5.1 2021-03-05 [2] ## pkgbuild 1.2.0 2020-12-15 [2] ## pkgconfig 2.0.3 2019-09-22 [2] ## pkgload 1.2.0 2021-02-23 [2] ## plyr 1.8.6 2020-03-03 [2] ## prettyunits 1.1.1 2020-01-24 [2] ## processx 3.4.5 2020-11-30 [2] ## ps 1.6.0 2021-02-28 [2] ## purrr * 0.3.4 2020-04-17 [2] ## R6 2.5.0 2020-10-28 [2] ## randomForest 4.6-14 2018-03-25 [2] ## RColorBrewer 1.1-2 2014-12-07 [2] ## Rcpp 1.0.6 2021-01-15 [2] ## readr * 1.4.0 2020-10-05 [2] ## readxl 1.3.1 2019-03-13 [1] ## remotes 2.2.0 2020-07-21 [2] ## reprex 1.0.0 2021-01-27 [2] ## reshape 0.8.8 2018-10-23 [1] ## rJava * 0.9-13 2020-07-06 [2] ## rlang 0.4.10 2020-12-30 [2] ## rmarkdown 2.7 2021-02-19 [2] ## rpart * 4.1-15 2019-04-12 [2] ## rpart.plot * 3.0.9 2020-09-17 [1] ## rprojroot 2.0.2 2020-11-15 [2] ## rstudioapi 0.13 2020-11-12 [2] ## rvest 1.0.0 2021-03-09 [2] ## RWeka 0.4-43 2020-08-23 [1] ## RWekajars 3.9.3-2 2019-10-19 [1] ## sass 0.3.1 2021-01-24 [2] ## scagnostics * 0.2-4.1 2018-04-04 [1] ## scales 1.1.1 2020-05-11 [2] ## sessioninfo 1.1.1 2018-11-05 [2] ## stringi 1.5.3 2020-09-09 [2] ## stringr * 1.4.0 2019-02-10 [2] ## survival 3.2-7 2020-09-28 [2] ## testthat 3.0.2 2021-02-14 [2] ## tibble * 3.1.0 2021-02-25 [2] ## tidyr * 1.1.3 2021-03-03 [2] ## tidyselect 1.1.0 2020-05-11 [2] ## tidyverse * 1.3.0 2019-11-21 [2] ## usethis 2.0.1 2021-02-10 [2] ## utf8 1.2.1 2021-03-12 [2] ## vctrs 0.3.6 2020-12-17 [2] ## withr 2.4.1 2021-01-26 [2] ## xfun 0.22 2021-03-11 [2] ## XML 3.99-0.5 2020-07-23 [2] ## xml2 1.3.2 2020-04-23 [2] ## yaml 2.2.1 2020-02-01 [2] ## source ## CRAN (R 4.0.0) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.4) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.0) ## CRAN (R 4.0.2) ## CRAN (R 4.0.4) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.4) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.0) ## CRAN (R 4.0.2) ## CRAN (R 4.0.1) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.0) ## CRAN (R 4.0.2) ## CRAN (R 4.0.0) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.4) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.4) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.4) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.0) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## Github (grantmcdermott/parttree@9d25d2c) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.0) ## CRAN (R 4.0.4) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.0) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.0) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.4) ## CRAN (R 4.0.4) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.2) ## CRAN (R 4.0.0) ## CRAN (R 4.0.4) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## CRAN (R 4.0.2) ## CRAN (R 4.0.4) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.2) ## CRAN (R 4.0.0) ## CRAN (R 4.0.0) ## ## [1] /Users/shiringlander/Library/R/4.0/library ## [2] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
 
 
 
