Decision Boundary for a Series of Machine Learning Models
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Machine Learning at the Boundary:
There is nothing new in the fact that machine learning models can outperform traditional econometric models but I want to show as part of my research why and how some models make given predictions or in this instance classifications.
I wanted to show the decision boundary in which my binary classification model was making. That is, I wanted to show the partition space that splits my classification into each class. The problem and code can be split into a multi-classification problem with some tweeks.
Initialisation:
I first load in a series of packages and initialise a logistic function to convert log-odds to a logistic probability function later on.
library(dplyr) library(patchwork) library(ggplot2) library(knitr) library(kableExtra) library(purrr) library(stringr) library(tidyr) library(xgboost) library(lightgbm) library(keras) library(tidyquant) ##################### Pre-define some functions ################################### ################################################################################### logit2prob <- function(logit){ odds <- exp(logit) prob <- odds / (1 + odds) return(prob) }
The data:
I use the iris
dataset which contains information on 3 different plant variables collected by British statistican Ronald Fisher in 1936. The dataset consists of \(4\) different characteristics of plant species which should uniquely distinguish the \(3\) different species (Setosa, Virginica and Versicolor). However, my problem required a binary classification problem and not a multi-classifciation problem. In the following code I import the iris
data and remove a type of plant Species virginica
to bring it from a multi-classification to a binary classification problem.
################################################################################### ################################################################################### data(iris) df <- iris %>% filter(Species != "virginica") %>% mutate(Species = +(Species == "versicolor")) str(df) ## 'data.frame': 100 obs. of 5 variables: ## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... ## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... ## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... ## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... ## $ Species : int 0 0 0 0 0 0 0 0 0 0 ... ################################################################################### ###################################################################################
I plot the data by first storing the ggplot
objects changing only the x
and y
variables in each of the plt
’s.
plt1 <- df %>% ggplot(aes(x = Sepal.Width, y = Sepal.Length, color = factor(Species))) + geom_point(size = 4) + theme_bw(base_size = 15) + theme(legend.position = "none") plt2 <- df %>% ggplot(aes(x = Petal.Length, y = Sepal.Length, color = factor(Species))) + geom_point(size = 4) + theme_bw(base_size = 15) + theme(legend.position = "none") plt3 <- df %>% ggplot(aes(x = Petal.Width, y = Sepal.Length, color = factor(Species))) + geom_point(size = 4) + theme_bw(base_size = 15) + theme(legend.position = "none") plt3 <- df %>% ggplot(aes(x = Sepal.Length, y = Sepal.Width, color = factor(Species))) + geom_point(size = 4) + theme_bw(base_size = 15) + theme(legend.position = "none") plt4 <- df %>% ggplot(aes(x = Petal.Length, y = Sepal.Width, color = factor(Species))) + geom_point(size = 4) + theme_bw(base_size = 15) + theme(legend.position = "none") plt5 <- df %>% ggplot(aes(x = Petal.Width, y = Sepal.Width, color = factor(Species))) + geom_point(size = 4) + theme_bw(base_size = 15) + theme(legend.position = "none") plt6 <- df %>% ggplot(aes(x = Petal.Width, y = Sepal.Length, color = factor(Species))) + geom_point(size = 4) + theme_bw(base_size = 15) + theme(legend.position = "none")
I also wanted to use the new patchwork
package which makes displaying ggplot
plots very easy. i.e the below code plots our graphics as its written (1 top plot stretching the length of the grid space, 2 middle plots, another single plot and a further 2 more plots at the bottom.)
(plt1) / (plt2 + plt3)
Alternatively, we can re-arrange the plots into any way we wish and plot them in the following way:
(plt1 + plt2) / (plt5 + plt6)
Which I think looks awesome.
Objective
The objective is to build a classification algorithm to distinguish between the two classes and then compute the decision boundaries in order to better see how the models made such predictions.
In order to create the decision boundary plots for each variable combination we need the different combinatons of variables in the data.
################################################################################### ################################################################################### var_combos <- expand.grid(colnames(df[,1:4]), colnames(df[,1:4])) %>% filter(!Var1 == Var2) ################################################################################### ################################################################################### var_combos %>% head() %>% kable(caption = "Variable Combinations", escape = F, align = "c", digits = 2) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), font_size = 9, fixed_thead = T, full_width = F) %>% scroll_box(width = "100%", height = "200px")
Var1 | Var2 |
---|---|
Sepal.Width | Sepal.Length |
Petal.Length | Sepal.Length |
Petal.Width | Sepal.Length |
Sepal.Length | Sepal.Width |
Petal.Length | Sepal.Width |
Petal.Width | Sepal.Width |
Next, I want to use the different variable combinations previously and create lists (one for each variable combination) and populate these lists with synthetic data - or data from the minimum to maximum value of each variable combination. This will act as our synthetically created test data in which we make predictions on and build the decision boundary.
It’s important to note that the plots will eventually be 2 dimensional for illustrative purposes, therefore we only train the Machine Learning models on two variables, but for each combination of the two variables, these variables are the first two variables in the boundary_lists
data frames.
boundary_lists <- map2( .x = var_combos$Var1, .y = var_combos$Var2, ~select(df, .x, .y) %>% summarise( minX = min(.[[1]], na.rm = TRUE), maxX = max(.[[1]], na.rm = TRUE), minY = min(.[[2]], na.rm = TRUE), maxY = max(.[[2]], na.rm = TRUE) ) ) %>% map(., ~tibble( x = seq(.x$minX, .x$maxX, length.out = 200), y = seq(.x$minY, .x$maxY, length.out = 200), ) ) %>% map(., ~tibble( xx = rep(.x$x, each = 200), yy = rep(.x$y, time = 200) ) ) %>% map2(., asplit(var_combos, 1), ~ .x %>% set_names(.y)) ################################################################################### ###################################################################################
We can see how the first 4 observations of the first and last two lists look like:
boundary_lists %>% map(., ~head(., 4)) %>% head(2) ## [[1]] ## # A tibble: 4 x 2 ## Sepal.Width Sepal.Length ## <dbl> <dbl> ## 1 2 4.3 ## 2 2 4.31 ## 3 2 4.33 ## 4 2 4.34 ## ## [[2]] ## # A tibble: 4 x 2 ## Petal.Length Sepal.Length ## <dbl> <dbl> ## 1 1 4.3 ## 2 1 4.31 ## 3 1 4.33 ## 4 1 4.34 boundary_lists %>% map(., ~head(., 4)) %>% tail(2) ## [[1]] ## # A tibble: 4 x 2 ## Sepal.Width Petal.Width ## <dbl> <dbl> ## 1 2 0.1 ## 2 2 0.109 ## 3 2 0.117 ## 4 2 0.126 ## ## [[2]] ## # A tibble: 4 x 2 ## Petal.Length Petal.Width ## <dbl> <dbl> ## 1 1 0.1 ## 2 1 0.109 ## 3 1 0.117 ## 4 1 0.126
Training time:
Now that we have the synthetically created testing data set up, I want to train the models on the actual observed observations. I use each data point in the plots above as my training data. I apply the following models:
- Logistic Model
- Support Vector Machine with a linear kernel
- Support Vector Machine with a polynomial kernel
- Support Vector Machine with a radial kernel
- Support Vector Machine with a sigmoid kernel
- Random Forest
- Extreme Gradiant Boosting (XGBoost) model with default parameters
- Single layer Keras Neural Network (with linear components)
- Deeper layer Keras Neural Network (with linear components)
- Deeper’er layer Keras Neural Network (with linear components)
- Light Gradient Boosting model (LightGBM) with default parameters
Side note: I am not an expert in Deep learning/Keras/Tensorflow, so I am sure better models will yield better decision boundaries but it was a fun task getting the different Machine Learning models to fit inside a purrr
, map
call.
################################################################################### ################################################################################### # params_lightGBM <- list( # objective = "binary", # metric = "auc", # min_data = 1 # ) # To install Light GBM try the following in your RStudio terinal # git clone --recursive https://github.com/microsoft/LightGBM # cd LightGBM # Rscript build_r.R models_list <- var_combos %>% mutate(modeln = str_c('mod', row_number())) %>% pmap(~ { xname = ..1 yname = ..2 modelname = ..3 df %>% select(Species, xname, yname) %>% group_by(grp = 'grp') %>% nest() %>% mutate(models = map(data, ~{ list( # Logistic Model Model_GLM = { glm(Species ~ ., data = .x, family = binomial(link='logit')) }, # Support Vector Machine (linear) Model_SVM_Linear = { e1071::svm(Species ~ ., data = .x, type = 'C-classification', kernel = 'linear') }, # Support Vector Machine (polynomial) Model_SVM_Polynomial = { e1071::svm(Species ~ ., data = .x, type = 'C-classification', kernel = 'polynomial') }, # Support Vector Machine (sigmoid) Model_SVM_radial = { e1071::svm(Species ~ ., data = .x, type = 'C-classification', kernel = 'sigmoid') }, # Support Vector Machine (radial) Model_SVM_radial_Sigmoid = { e1071::svm(Species ~ ., data = .x, type = 'C-classification', kernel = 'radial') }, # Random Forest Model_RF = { randomForest::randomForest(formula = as.factor(Species) ~ ., data = .) }, # Extreme Gradient Boosting Model_XGB = { xgboost( objective = 'binary:logistic', eval_metric = 'auc', data = as.matrix(.x[, 2:3]), label = as.matrix(.x$Species), # binary variable nrounds = 10) }, # Kera Neural Network Model_Keras = { mod <- keras_model_sequential() %>% layer_dense(units = 2, activation = 'relu', input_shape = 2) %>% layer_dense(units = 2, activation = 'sigmoid') mod %>% compile( loss = 'binary_crossentropy', optimizer_sgd(lr = 0.01, momentum = 0.9), metrics = c('accuracy') ) fit(mod, x = as.matrix(.x[, 2:3]), y = to_categorical(.x$Species, 2), epochs = 5, batch_size = 5, validation_split = 0 ) print(modelname) assign(modelname, mod) }, # Kera Neural Network Model_Keras_2 = { mod <- keras_model_sequential() %>% layer_dense(units = 2, activation = 'relu', input_shape = 2) %>% layer_dense(units = 2, activation = 'linear', input_shape = 2) %>% layer_dense(units = 2, activation = 'sigmoid') mod %>% compile( loss = 'binary_crossentropy', optimizer_sgd(lr = 0.01, momentum = 0.9), metrics = c('accuracy') ) fit(mod, x = as.matrix(.x[, 2:3]), y = to_categorical(.x$Species, 2), epochs = 5, batch_size = 5, validation_split = 0 ) print(modelname) assign(modelname, mod) }, # Kera Neural Network Model_Keras_3 = { mod <- keras_model_sequential() %>% layer_dense(units = 2, activation = 'relu', input_shape = 2) %>% layer_dense(units = 2, activation = 'relu', input_shape = 2) %>% layer_dense(units = 2, activation = 'linear', input_shape = 2) %>% layer_dense(units = 2, activation = 'sigmoid') mod %>% compile( loss = 'binary_crossentropy', optimizer_sgd(lr = 0.01, momentum = 0.9), metrics = c('accuracy') ) fit(mod, x = as.matrix(.x[, 2:3]), y = to_categorical(.x$Species, 2), epochs = 5, batch_size = 5, validation_split = 0 ) print(modelname) assign(modelname, mod) }, # LightGBM model Model_LightGBM = { lgb.train( data = lgb.Dataset(data = as.matrix(.x[, 2:3]), label = .x$Species), objective = 'binary', metric = 'auc', min_data = 1 #params = params_lightGBM, #learning_rate = 0.1 ) } ) } )) }) %>% map( ., ~unlist(., recursive = FALSE) )
Testing time:
Now that the models have been trained, we can begin makign the predictions on the synthetically created data we created in the boundary_lists
data.
models_predict <- map2(models_list, boundary_lists, ~{ mods <- purrr::pluck(.x, "models") dat <- .y map(mods, function(x) tryCatch({ if(attr(x, "class")[1] == "glm"){ # predict the logistic model tibble( modelname = attr(x, "class")[1], prediction = predict(x, newdata = dat) ) %>% mutate( prediction = logit2prob(prediction), prediction = case_when( prediction > 0.5 ~ 1, TRUE ~ 0 ) ) } else if(attr(x, "class")[1] == "svm.formula"){ # predict the SVM model tibble( modelname = attr(x, "class")[1], prediction = as.numeric(as.character(predict(x, newdata = dat))) ) } else if(attr(x, "class")[1] == "randomForest.formula"){ # predict the RF model tibble( modelname = attr(x, "class")[1], prediction = as.numeric(as.character(predict(x, newdata = dat))) ) } else if(attr(x, "class")[1] == "xgb.Booster"){ # predict the XGBoost model tibble( modelname = attr(x, "class")[1], prediction = predict(x, newdata = as.matrix(dat), type = 'prob') ) %>% mutate( prediction = case_when( prediction > 0.5 ~ 1, TRUE ~ 0 ) ) } else if(attr(x, "class")[1] == "keras.engine.sequential.Sequential"){ # Keras Single Layer Neural Network tibble( modelname = attr(x, "class")[1], prediction = predict_classes(object = x, x = as.matrix(dat)) ) } else if(attr(x, "class")[2] == "keras.engine.training.Model"){ # NOTE:::: This is a very crude hack to have multiple keras NN models # Needs fixing such that the models are named better - (not like) - (..., "class")[2], ..., "class")[3]... and so on. # Keras Single Layer Neural Network tibble( modelname = attr(x, "class")[2], # needs changing also. prediction = predict_classes(object = x, x = as.matrix(dat)) ) } else if(attr(x, "class")[1] == "lgb.Booster"){ # Light GBM model tibble( modelname = attr(x, "class")[1], prediction = predict(object = x, data = as.matrix(dat), rawscore = FALSE) ) %>% mutate( prediction = case_when( prediction > 0.5 ~ 1, TRUE ~ 0 ) ) } }, error = function(e){ print('skipping\n') } ) ) } ) %>% map(., ~setNames(., map(., ~c(.x$modelname[1] ) ) ) ) %>% map(., ~map(., ~setNames(., c( paste0(.x$modelname[1], "_Model"), paste0(.x$modelname[1], "_Prediction") ) ) ) )
Calibrating the data
Now we have our trained models, along with predictions we can map
these predictions into data which we can plot using ggplot
and then arrange using patchwork
!.
plot_data <- map2( .x = boundary_lists, .y = map( models_predict, ~map(., ~tibble(.) ) ), ~bind_cols(.x, .y) ) names(plot_data) <- map_chr( plot_data, ~c( paste( colnames(.)[1], "and", colnames(.)[2], sep = "_") ) )
Now that we have our predictions we can create the ggplots
.
ggplot_lists <- plot_data %>% map( ., ~select( ., -contains("Model") ) %>% pivot_longer(cols = contains("Prediction"), names_to = "Model", values_to = "Prediction") ) %>% map( .x = ., ~ggplot() + geom_point(aes( x = !!rlang::sym(colnames(.x)[1]), y = !!rlang::sym(colnames(.x)[2]), color = factor(!!rlang::sym(colnames(.x)[4])) ), data = .x) + geom_contour(aes( x = !!rlang::sym(colnames(.x)[1]), y = !!rlang::sym(colnames(.x)[2]), z = !!rlang::sym(colnames(.x)[4]) ), data = .x) + geom_point(aes( x = !!rlang::sym(colnames(.x)[1]), y = !!rlang::sym(colnames(.x)[2]), color = factor(!!rlang::sym(colnames(df)[5])) # this is the status variable ), size = 8, data = df) + geom_point(aes( x = !!rlang::sym(colnames(.x)[1]), y = !!rlang::sym(colnames(.x)[2]) ), size = 8, shape = 1, data = df) + facet_wrap(~Model) + theme_bw(base_size = 25) + theme(legend.position = "none") )
Plot all the different combinations of the decision boundaries. Note: The above code will work better in your console, when I ran the code to compile the blog post the plots were too small. Therefore, I provide individual plots for a sample of the models & variable combinations.
I first needed to select the first two columns which are the variables of interest (Petal.Width, Petal.Length, Sepal.Width and Sepal.Length). Then I wanted to take a random sample of the columns thereafter (which are the different Machine Learning Model predictions).
plot_data_sampled <- plot_data %>% map( ., ~select( ., -contains("Model") ) %>% select(., c(1:2), sample(colnames(.), 2) ) %>% pivot_longer( cols = contains("Prediction"), names_to = "Model", values_to = "Prediction") )
Next I can make the plots by taking a random sample of the lists.
plot_data_sampled %>% rlist::list.sample(1) %>% map( .x = ., ~ggplot() + geom_point(aes( x = !!rlang::sym(colnames(.x)[1]), y = !!rlang::sym(colnames(.x)[2]), color = factor(!!rlang::sym(colnames(.x)[4])) ), data = .x) + geom_contour(aes( x = !!rlang::sym(colnames(.x)[1]), y = !!rlang::sym(colnames(.x)[2]), z = !!rlang::sym(colnames(.x)[4]) ), data = .x) + geom_point(aes( x = !!rlang::sym(colnames(.x)[1]), y = !!rlang::sym(colnames(.x)[2]), color = factor(!!rlang::sym(colnames(df)[5])) # this is the status variable ), size = 3, data = df) + geom_point(aes( x = !!rlang::sym(colnames(.x)[1]), y = !!rlang::sym(colnames(.x)[2]) ), size = 3, shape = 1, data = df) + facet_wrap(~Model) + #coord_flip() + theme_tq(base_family = "serif") + theme( #aspect.ratio = 1, axis.line.y = element_blank(), axis.ticks.y = element_blank(), legend.position = "bottom", #legend.title = element_text(size = 20), #legend.text = element_text(size = 10), axis.title = element_text(size = 20), axis.text = element_text(size = "15"), strip.text.x = element_text(size = 15), plot.title = element_text(size = 30, hjust = 0.5), strip.background = element_rect(fill = 'darkred'), panel.background = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), #axis.text.x = element_text(angle = 90), axis.text.y = element_text(angle = 90, hjust = 0.5), #axis.title.x = element_blank() legend.title = element_blank(), legend.text = element_text(size = 20) ) ) ## $Sepal.Width_and_Petal.Length ## Warning: Row indexes must be between 0 and the number of rows (0). Use `NA` as row index to obtain a row full of `NA` values. ## This warning is displayed once per session.
Some other random models:
## $Sepal.Width_and_Sepal.Length
## $Sepal.Width_and_Sepal.Length
## $Petal.Length_and_Sepal.Length
## $Petal.Width_and_Petal.Length
## $Petal.Length_and_Petal.Width ## Warning in grDevices::contourLines(x = sort(unique(data$x)), y = ## sort(unique(data$y)), : todos los valores de z son iguales ## Warning: Not possible to generate contour data
Natually the linear models made a linear decision boundary. It looks like the random forest model overfit a little the data, where as the XGBoost and LightGBM models were able to make better, more generalisable decision boundaries. The Keras Neural Networks performed poorly because they should be trained better.
glm
= Logistic Modelkeras.engine.sequential.Sequential Prediction...18
= Single layer Neural Networkkeras.engine.sequential.Sequential Prediction...18
= Deeper layer Neural Networkkeras.engine.sequential.Sequential Prediction...22
= Deeper’er layer Neural Networklgb.Booster Prediction
= Light Gradient Boosted ModelrandomForest.formula Prediction
= Random Forest Modelsvm.formula Prediction...10
= Support Vector Machine with a Sigmoid Kernelsvm.formula Prediction...12
= Support Vector Machine with a Radial Kernelsvm.formula Prediction...6
= Support Vector Machine with a Linear Kernelsvm.formula Prediction...8
= Support Vector Machine with a Polynomial Kernelxgb.Booster Prediction
= Extreme Gradient Boosting Model
In many of the combinations the Keras Neural Network model just predicted all observations to be of a specific class (again by my poor tuning of the models and the fact that the Neural Networks only had 100 observations to learn from and 40,000 observation to predict on). That is, it coloured the whole background blue or red and made many mis-classifications. In some of the plots the Neural Networks managed to mae perfect classifications, in other it made strange decision boundaries. - Neural Networks are fun.
As some brief analysis of the plots, it looks like the simple logistic model made near-perfect classifications. Which isn’t suprising given that each of the variable ralationships are linearly seperable. However, I have a preferece for XGBoost and LightGBM models since they can handle non-linear relationships through the incorporation of regularisation in its objective functions which allows them to make more robust decision boundaries. Random Forest models fail here which is why their decision boundary appears to do a good job but is also slightly erratic and sharpe in it’s decision boundaries.
Of course it goes without saying that these decision boundaries can become significantly more complex and non-linear with the inclusion of more variables and higher dimensions.
for(i in 1:length(plot_data)){ print(ggplot_lists[[i]]) }
End note:
I wrote this model on an Amazon Ubuntu EC2 Instance however, when I went to compile the blog post in R on my Windows system I ran into some problems. These problems were mostly down to installing the lightgbm
package and package versions. The code was working without error using the following package versions (i.e. using the most up-to-date package versions)
sessionInfo() ## R version 3.6.1 (2019-07-05) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## Running under: Windows 10 x64 (build 17763) ## ## Matrix products: default ## ## locale: ## [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 ## [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C ## [5] LC_TIME=Spanish_Spain.1252 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] tidyquant_0.5.7 quantmod_0.4-15 ## [3] TTR_0.23-6 PerformanceAnalytics_1.5.3 ## [5] xts_0.11-2 zoo_1.8-6 ## [7] lubridate_1.7.4 keras_2.2.5.0 ## [9] lightgbm_2.3.2 R6_2.4.1 ## [11] xgboost_0.90.0.1 tidyr_1.0.0 ## [13] stringr_1.4.0 purrr_0.3.2 ## [15] kableExtra_1.1.0.9000 knitr_1.25.4 ## [17] ggplot2_3.2.1 patchwork_1.0.0 ## [19] dplyr_0.8.99.9000 ## ## loaded via a namespace (and not attached): ## [1] Rcpp_1.0.3 lattice_0.20-38 class_7.3-15 ## [4] utf8_1.1.4 assertthat_0.2.1 zeallot_0.1.0 ## [7] digest_0.6.24 e1071_1.7-2 evaluate_0.14 ## [10] httr_1.4.1 blogdown_0.15 pillar_1.4.3.9000 ## [13] tfruns_1.4 rlang_0.4.4 lazyeval_0.2.2 ## [16] curl_4.0 rstudioapi_0.10 data.table_1.12.8 ## [19] whisker_0.3-2 Matrix_1.2-17 reticulate_1.14-9001 ## [22] rmarkdown_1.14 lobstr_1.1.1 labeling_0.3 ## [25] webshot_0.5.1 readr_1.3.1 munsell_0.5.0 ## [28] compiler_3.6.1 xfun_0.8 pkgconfig_2.0.3 ## [31] base64enc_0.1-3 tensorflow_2.0.0 htmltools_0.3.6 ## [34] tidyselect_1.0.0 tibble_2.99.99.9014 bookdown_0.13 ## [37] quadprog_1.5-7 randomForest_4.6-14 fansi_0.4.1 ## [40] viridisLite_0.3.0 crayon_1.3.4 withr_2.1.2 ## [43] rappdirs_0.3.1 grid_3.6.1 Quandl_2.10.0 ## [46] jsonlite_1.6.1 gtable_0.3.0 lifecycle_0.1.0 ## [49] magrittr_1.5 scales_1.0.0 rlist_0.4.6.1 ## [52] cli_2.0.1 stringi_1.4.3 xml2_1.2.2 ## [55] ellipsis_0.3.0 generics_0.0.2 vctrs_0.2.99.9005 ## [58] tools_3.6.1 glue_1.3.1 hms_0.5.1 ## [61] yaml_2.2.0 colorspace_1.4-1 rvest_0.3.4
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.