Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In this tutorial I will discuss on how to use keras package with tensor flow as back end to build an anomaly detection model using auto encoders. Auto encoders is a unsupervised learning technique where the initial data is encoded to lower dimensional and then decoded (reconstructed) back. Based on our initial data and reconstructed data we will calculate the score.
About Dataset
The data is contains 66 features extracted from a vibration signal from x, y & z axis. For the experiment, a 3 axis vibration sensor was hooked up to a table press drill. There are total for 4 failure modes within the data set. This data also as numeric and categorical labels.
Load the libraries
# load the libraries library(keras) library(dplyr)
Load data set to R
The data is loaded and the labels are removed. The data set is split into train and test. Train includes calibration data and test includes remaining data set. The data is converted to matrix as required by keras package.
# load the data set data = read.csv("features.csv", header = T) %>% select(-c(yLabel, Y)) %>% as.data.frame() # convert all to numeric data = sapply(data, as.numeric) # split the data in to train and test train = data[1:50,] %>% as.matrix() test = data[51:357,] %>% as.matrix()
Set parameters for auto encoder model
We are creating a set of parameters below. This is optional. But, it makes it easy for hyper parameter tuning.
dropOut = 0.2 atvn = "sigmoid" batch = 10
Auto encoder model
The auto encoder model used here is a symmetric model.
Input layer: Includes the train shape of the data. ie, total of 66 features.
Encoder: we tie up the input later with 4 layers with batch normalization and dropout.
Decoder: Its a symmetric of encoder.
# create auto encoder architecture input_layer = layer_input(shape = c(66)) encoder = input_layer %>% layer_dense(units = 512, activation = atvn) %>% layer_batch_normalization() %>% layer_dropout(rate = dropOut) %>% layer_dense(units = 128, activation = atvn) %>% layer_dropout(rate = dropOut) %>% layer_dense(units = 64, activation = atvn) %>% layer_dense(units = 32) decoder = encoder %>% layer_dense(units = 64, activation = atvn) %>% layer_dropout(rate = dropOut) %>% layer_dense(units = 128, activation = atvn) %>% layer_dropout(rate = dropOut) %>% layer_dense(units = 512, activation = atvn) %>% layer_dense(units = 66) #
Training
Next, we combine our input layer and decoder to form a auto encoder model. Next, we compile the model with different optimizer and loss function. Finally we can fit the model and plot the results.
# combine encoder and decoder layers autoencoder_model = keras_model(inputs = input_layer, outputs = decoder) # compile the model autoencoder_model %>% compile( loss='mean_squared_error', optimizer='adam' ) # look at the summary of the model summary(autoencoder_model) # fit the model history = autoencoder_model %>% keras::fit(train, train, epochs=100, shuffle=TRUE, batch = batch, validation_data= list(test, test) ) # view the history plot(history)
Reconstruction error and anomaly limits
We have a function calculate the reconstruction error and based on train data set, we will use 85% quantile to set the anomaly limit. We finally combine the data set to plot the results. Below, we see all green is healthy data points and red is abnormal condition.
# function to calculate reconstruction error reconstMSE = function(i){ reconstructed_points = autoencoder_model %>% predict(x = data[i,] %>% matrix(nrow = 1, ncol = 66) ) return(mean((data[i,] - reconstructed_points)^2)) } # inital data is train data = train # calculate reconstruction error trainRecon = data.frame(data = train, score = do.call(rbind, lapply(1:50, FUN = reconstMSE) ) ) # calculate anomaly limit anomalyLimit = quantile(trainRecon$score, p = 0.85) # next, test data data = test # calculate test reconstruction error testRecon = data.frame(data = test, score = do.call(rbind, lapply(1:nrow(data), FUN = reconstMSE) ) ) # combine train and test errors Recondata = rbind(trainRecon, testRecon) # plot the results plot(Recondata$score, col = ifelse(Recondata$score>anomalyLimit, "red", "green"), pch = 19, xlab = "observations", ylab = "score") abline(h = anomalyLimit, col = "red", lwd = 1)
From the above result, we observer that we have few false positives. But, we could tune the parameters and retrain them to achieve higher accuracy.
You can find the notebook version of this tutorial on my github page. You can also find machine learning versions here.
I will write up another post on how to do fault classification in the next few days. Stay tuned!
Hope you’all enjoyed this tutorial. Please let me know in the comments what you think of this.
The post Anomaly Detection for Predictive Maintenance using Keras appeared first on Hi! I am Nagdev.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.