Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Disclaimer: While working on this project on my local machine I noticed that the code was making my computer heat up. To avoid the risk of overheating my computer I opted to use a Kaggle notebook. As a bonus, I got to use some GPU computing which made training this model much faster than it would be on my machine! Feel free to run the code on your machine or fork the notebook!
Introduction
In my previous blog I explored sentiment prediction by using LSTM networks and its implementation with keras
and R. In this blog I am going to share how to predict rotation of Rubiks cubes with convolutional neural networks (CNNs). For this challenge, I had the opportunity to do some basic image preprocessing and construct a CNN which predicts and continuous output as opposed to a categorical one which is more common among CNN application examples. Since the goal is to predict a continuous value, the aim is to reduce the margin of error between the predicted and the true value to be as small as possible.
The Data
The data consists of two folders. The training folder has a .csv file lists the file names of each rubiks cube image among the training images and their respective rotation. The images consist of 5000 512×512 pixel color (RGB) images of rotated rubiks cubes. The other folder has testing images whose angle of rotation are not given. Since the test data does not have any labels, for our purposes it is not going to be very helpful. So for this blog we are going to be working with just the training data and its labels.
The labels for the training data looks like this:
training_labels <- readr::read_csv("../input/rubix-cube/training/training/labels.csv") head(training_labels)
filename | xRot |
---|---|
<chr> | <dbl> |
000000.jpg | 336.8389 |
000001.jpg | 148.4844 |
000002.jpg | 244.8217 |
000003.jpg | 222.7006 |
000004.jpg | 172.3581 |
000005.jpg | 205.6921 |
Thanks to the imager
package it is possible to convert the images into matrix form which can then be converted into an array which keras
likes. One of the questions that I encountered was related to converting a list of 3D arrays to s 4D array, which I was able to figure out thanks to this stackoverflow question. While the solution is not eloquent, it works.
Due to the size of the dataset, after preprocessing the data needed to be converted and used in groups and the model needed to be trained in even smaller batches. An example for processing the first 3 images in the dataset would be:
library(tidyverse) library(imager) images<- lapply( training_labels[["filename"]][1:3], function(x) paste0("../input/rubix-cube/training/training/images/",x) %>% load.image() %>% as.cimg()) %>% lapply(function(x) x[,,,]) # Source: https://stackoverflow.com/questions/62060430/image-list-of-3d-arrays-to-one-4d-array images_array<-array(NA, dim = c(length(images), 512, 512, 3)) for(j in 1:length(images)){ images_array[j,,,]<-images[[j]] }
They can also be plotted visually with purrr
:
par(mfrow=c(1,3)) images_array[1:3,,,] %>% purrr::array_tree(1) %>% purrr::set_names(training_labels[["xRot"]][1:3]) %>% purrr::map(as.raster) %>% purrr::iwalk(~{plot(.x); title(.y)})
With this, the images are preprocessed and able to be used for training our model.
The Model
As far as modelling is concerned, I created a convolutional neural network where the first layer and the input shape matches the dimensions of the images. The subsequent layers are pretty pretty much follow the code used on RStudio’s website with their CNNs example. As far as the loss function is concerned I opted for mean squared error, and have it and mean absolute value as a metrics.
library(keras) model <- keras_model_sequential() %>% layer_conv_2d(filters = 512, kernel_size = c(3,3), activation = "relu", input_shape = c(512,512,3)) %>% layer_max_pooling_2d(pool_size = c(2,2)) %>% layer_conv_2d(filters = 256, kernel_size = c(3,3), activation = "relu") %>% layer_max_pooling_2d(pool_size = c(2,2)) %>% layer_conv_2d(filters = 128, kernel_size = c(3,3), activation = "relu") %>% layer_flatten() %>% layer_dense(units = 64, activation = "relu") %>% layer_dense(units = 1, activation = "relu") # Compile the model model %>% compile( optimizer = "adam", loss = "mean_squared_error", metrics = c("mean_squared_error", "mean_absolute_error") )
Due to the magnitude in size of the data, the data cannot be preprocessed and trained in a single step. In leiu of this, the data is grouped and trained in groups of 100 images with batch sizes being 2.
set.seed(1234) # Using chunks history<-list() for (i in 0:49){ start_index <- i*100+1 end_index <- (i+1)*100 images<- lapply( training_labels[["filename"]][start_index:end_index], function(x) paste0("../input/rubix-cube/training/training/images/",x) %>% load.image() %>% as.cimg()) %>% lapply(function(x) x[,,,]) # Source: https://stackoverflow.com/questions/62060430/image-list-of-3d-arrays-to-one-4d-array images_array<-array(NA, dim = c(length(images), 512, 512, 3)) for(j in 1:length(images)){ images_array[j,,,]<-images[[j]] } # Split data into train-test groups labels <- training_labels[["xRot"]][start_index:end_index] smp_size <- floor(0.75 * length(images)) train_ind <- sample(seq_len(length(images)), size = smp_size) train_x <- images_array[train_ind,,,] test_x <- images_array[-train_ind,,,] train_y <- labels[train_ind] test_y<- labels[-train_ind] # training the model history[[i+1]]<- model %>% fit(x=train_x, y=train_y, epochs=10, batch_size=2, verbose = getOption("keras.fit_verbose", default = 1), validation_split = 0.25, validation_data = list(test_x,test_y)) # Free up unused RAM gc() } history[[50]] Final epoch (plot to see history): loss: 2.339 mean_squared_error: 2.339 mean_absolute_error: 1.303 val_loss: 6.561 val_mean_squared_error: 6.561 val_mean_absolute_error: 1.865 plot(history[[50]])
From the model history’s final interation, the validation MSE which isn’t bad. But if we want to make something more production worthy, a better model is definitely required.
If you know how to make this model better, or know of a better approach, please let me know! I would love to learn how to get better at making machine learning models!
Conclusion
There we have it! It was really interesting getting to preprocess images and deal with the quirks of having to deal with processing limitations and still managing to train the model. I will definitely keep this blog handy for my next image classification project.
Thank you for reading!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.