Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A few weeks ago in #112 and #120, I presented a few Python examples of Deep Quasi-Randomized ‘neural’ networks (QRNs). In this post, I will provide a detailed introduction to this new family of models, with examples in Python and R, and a preprint.
- Link to the preprint
- Link to a Jupyter Python notebook with a benchmark on 14 data sets: https://github.com/Techtonique/nnetsauce/blob/master/nnetsauce/demo/thierrymoudiki_20240519_deep_qrns.ipynb
At the basis of Deep QRNs are QRNs; nnetsauce
’s CustomClassifier
class objects, which, in turn, depend on a base Machine Learning model. This base learner could be any classifier, and in particular any scikit-learn
classifier, or xgboost
, or else. Here is how a CustomClassifier
works, with the information flowing from left to right (forward pass only)
Deep QRNs arise from the QRN presented in this figure: in the case where we’d like to have a 3-layered deep QRN, the base Machine Learning model depicted in the figure can in turn be a QRN, and the obtained QRN can in turn be a QRN. Here are R examples:
Install nnetsauce
install.packages("nnetsauce", repos = c("https://techtonique.r-universe.dev", "https://cran.r-project.org"))
Load nnetsauce
library("nnetsauce")
iris data
library(datasets) set.seed(123) X <- as.matrix(iris[, 1:4]) y <- as.integer(iris$Species) - 1L # split data into training and test sets (index_train <- base::sample.int(n = nrow(X), size = floor(0.8*nrow(X)), replace = FALSE)) X_train <- X[index_train, ] y_train <- y[index_train] X_test <- X[-index_train, ] y_test <- y[-index_train] # base model is a Logistic Regression obj2 <- sklearn$linear_model$LogisticRegressionCV() # there are 3 layers in the deep model obj <- DeepClassifier(obj2, n_layers = 3L) # adjust the model res <- obj$fit(X_train, y_train) # accuracy, must be 1 print(mean(obj$predict(X_test)==y_test))
palmer penguins data
library(palmerpenguins) data(penguins) penguins_ <- as.data.frame(palmerpenguins::penguins) replacement <- median(penguins$bill_length_mm, na.rm = TRUE) penguins_$bill_length_mm[is.na(penguins$bill_length_mm)] <- replacement replacement <- median(penguins$bill_depth_mm, na.rm = TRUE) penguins_$bill_depth_mm[is.na(penguins$bill_depth_mm)] <- replacement replacement <- median(penguins$flipper_length_mm, na.rm = TRUE) penguins_$flipper_length_mm[is.na(penguins$flipper_length_mm)] <- replacement replacement <- median(penguins$body_mass_g, na.rm = TRUE) penguins_$body_mass_g[is.na(penguins$body_mass_g)] <- replacement # replacing NA's by the most frequent occurence penguins_$sex[is.na(penguins$sex)] <- "male" # most frequent # one-hot encoding for covariates penguins_mat <- model.matrix(species ~., data=penguins_)[,-1] penguins_mat <- cbind.data.frame(penguins_$species, penguins_mat) penguins_mat <- as.data.frame(penguins_mat) colnames(penguins_mat)[1] <- "species" y <- penguins_mat$species X <- as.matrix(penguins_mat[,2:ncol(penguins_mat)]) n <- nrow(X) p <- ncol(X) set.seed(1234) index_train <- sample(1:n, size=floor(0.8*n)) X_train <- X[index_train, ] y_train <- factor(y[index_train]) X_test <- X[-index_train, ][1:5, ] y_test <- factor(y[-index_train][1:5]) # base model is a Logistic Regression obj2 <- nnetsauce::sklearn$linear_model$LogisticRegressionCV() # there are 3 layers in the deep model obj <- DeepClassifier(obj2, n_layers = 3L) # adjust the model res <- obj$fit(X_train, y_train) # accuracy, must be 1 print(mean(obj$predict(X_test) == y_test))
It’s worth mentioning that the R version is a bit less stable than the Python version. Maybe because I’m not a reticulate superstar. I’m open to any suggestion/pull requests regarding this R port from the Python package.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.