Kannada MNIST Prediction Classification using H2O AutoML in R

Posted on October 2, 2019 by AbdulMajedRaja RS in R bloggers | 0 Comments

[This article was first published on r-bloggers on Programming with R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Kannada MNIST dataset is another MNIST-type Digits dataset for Kannada (Indian) Language. All details of the dataset curation has been captured in the paper titled: “Kannada-MNIST: A new handwritten digits dataset for the Kannada language.” by Vinay Uday Prabhu. The github repo of the author can be found here.

The objective of this post is to demonstrate how to use h2o.ai’s automl function to quickly get a (better) baseline. Thsi also proves a point how these automl tools help democratizing Machine Learning Model Building process.

Loading required libraries

h2o – for Machine Learning
tidyverse – for Data Manipulation

library(h2o)
library(tidyverse)

Initializing H2O Cluster

h2o::h2o.init()

Reading Input Files (Data)

train <- read_csv("~/Documents/R Codes/Kannada-MNIST/train.csv")
test <- read_csv("~/Documents/R Codes/Kannada-MNIST/test.csv")
valid <- read_csv("~/Documents/R Codes/Kannada-MNIST/Dig-MNIST.csv")
submission <- read_csv("~/Documents/R Codes/Kannada-MNIST//sample_submission.csv")

Checking the shape / dimension of the dataframe

dim(train)

784 Pixel Values + 1 Label denoting what digit it’s.

Label Count

train  %>% count(label)

Visualizing the Kannada MNIST Digits

# visualize the digits
par(mfcol=c(6,6))

par(mar=c(0, 0, 3, 0), xaxs='i', yaxs='i')

for (idx in 1:36) { 

im<-matrix((train[idx,2:ncol(train)]), nrow=28, ncol=28)

im_numbers <- apply(im, 2, as.numeric)

image(1:28, 1:28, im_numbers, col=gray((0:255)/255), main=paste(train$label[idx]))
}

Converting R dataframe to H2O object which is required by H2O functions

train_h <- as.h2o(train)
test_h <- as.h2o(test)
valid_h <- as.h2o(valid)

Converting our numeric target variable into a factor for the algorithm to perform Classification

train_h$label <- as.factor(train_h$label)
valid_h$label <- as.factor(valid_h$label)

Explanatory and Response Variables

x <- names(train)[-1]
y <- 'label'

AutoML in Action

aml <- h2o::h2o.automl(x = x, 
                       y = y,
                       training_frame = train_h,
                       nfolds = 3,
                       leaderboard_frame = valid_h,
                       max_runtime_secs = 1000)

nfolds denotes the number of folds for cross-validation and max_runtime_secs represents the maximum amount of time the AutoML process can go on.

AutoML Leaderboard

Leaderboard is where the AutoML lists the top performing Models.

aml@leaderboard

Prediction and Submission

pred <- h2o.predict(aml, test_h)  

submission$label <- as.vector(pred$predict)

#write_csv(submission, "submission_automl.csv")

Submission (for Kaggle)

write_csv(submission, "submission_automl.csv")

This is currently a playground Competition on Kaggle. So, this submission file can be submitted to this competition. Based on the above parameters the submission scored 0.90720 in the public leaderboard. 0.90 score in an MNIST Classification is close to nothing, but I hope this code snippet can serve as quick starter template for anyone attempting to begin with AutoML.

References

If you liked this, Please subscribe to my Language-agnostic Data Science Newsletter and also share it with your friends!

To leave a comment for the author, please follow the link and comment on their blog: r-bloggers on Programming with R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Kannada MNIST Prediction Classification using H2O AutoML in R

Loading required libraries

Initializing H2O Cluster

Reading Input Files (Data)

Checking the shape / dimension of the dataframe

Label Count

Visualizing the Kannada MNIST Digits

Converting R dataframe to H2O object which is required by H2O functions

Converting our numeric target variable into a factor for the algorithm to perform Classification

Explanatory and Response Variables

AutoML in Action

AutoML Leaderboard

Prediction and Submission

Submission (for Kaggle)

References

Related

Loading required libraries

Initializing H2O Cluster

Reading Input Files (Data)

Checking the shape / dimension of the dataframe

Label Count

Visualizing the Kannada MNIST Digits

Converting R dataframe to H2O object which is required by H2O functions

Converting our numeric target variable into a factor for the algorithm to perform Classification

Explanatory and Response Variables

AutoML in Action

AutoML Leaderboard

Prediction and Submission

Submission (for Kaggle)

References

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)