LSTM Network in R

finnstats

11 months ago

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

LSTM network in R, In this tutorial, we are going to discuss Recurrent Neural Networks. Recurrent Neural Networks are very useful for solving sequence of numbers-related issues.

The major applications involved in the sequence of numbers are text classification, time series prediction, frames in videos, DNA sequences Speech recognition problems, etc..

A special type of Recurrent Neural network is LSTM Networks.

LSTM networks are very popular and handy.

What is mean by LSTM?

LSTM stands for long short-term memory.

LSTM network helps to overcome gradient problems and makes it possible to capture long-term dependencies in the sequence of words or integers.

In this tutorial, we are using the internet movie database (IMDB). This database contains sentiments of movie reviews like 25000 positive reviews and 25000 negative reviews.

Load library

library(keras)
library(tensorflow)
use_condaenv("keras-tf", required = T)

Getting Data

imdb <- dataset_imdb(num_words = 500)

These datasets are already pre processed so no need to clean the datasets.

How to clean the datasets in R?

c(c(train_x, train_y), c(test_x, test_y)) %<-% imdb
length(train_x); length(test_x)

train_x and test_x contains integer values

train_y and test_y contains labels (0 & 1).

0 represent the negative sentiment and 1 represent positive sentiment in the movie review

table(train_y)
train_y
    0     1
12500 12500
table(test_y)
test_y
    0     1
12500 12500

This indicates that our dataset is balanced.

Words in the movie review are represented by unique integers and each integer is assigned by overall frequency in the dataset. Customer review can extract from the below command.

train_x[[10]]
[1]   1  14  20  47 111 439   2  19  12  15 166  12 216 125  40   6 364 352   2   2  39 294  11  22 396  13  28   8 202  12   2  23  94
[34]   2 151 111 211 469   4  20  13 258   2   2   2  12  16  38  78  33 211  15  12  16   2  63  93  12   6 253 106  10  10  48 335 267
[67]  18   6 364   2   2  20  19   6   2   7   2 189   5   6   2   7   2   2  95   2   6   2   7   2   2  49 369 120   5  28  49 253  10
[100]  10  13   2  19  85   2  15   4 481   9  55  78   2   9 375   8   2   8   2  76   7   4  58   5   4   2   9 243   7  43  50

Before doing any further analysis we need to make ensure the length of the movie reviews are equal, The current dataset has different length this can overcome based on padding process.

How to execute R in PyCharm?

Padding sequences

train_x <- pad_sequences(train_x, maxlen = 90)
num [1:25000, 1:90] 14 2 360 2 13 0 26 11 6 13 ...
test_x <- pad_sequences(test_x, maxlen = 90)
num [1:25000, 1:90] 0 2 30 8 10 20 2 2 2 50 ...

Now all the train_x and test_x integers restricted to 90 only. So padding removed all extra integers.

Now you can examine train_x[10,] customer review again

[1]  13 258   2   2   2  12  16  38  78  33 211  15  12  16   2  63  93  12   6 253 106  10  10  48 335 267  18   6 364   2   2  20  19   6
[35]   2   7   2 189   5   6   2   7   2   2  95   2   6   2   7   2   2  49 369 120   5  28  49 253  10  10  13   2  19  85   2  15   4 481
[69]   9  55  78   2   9 375   8   2   8   2  76   7   4  58   5   4   2   9 243   7  43  50

If the dataset contains fewer number integers suppose 60 integers remaining 30 integers that is 0 will be added automatically.

Model

Initiate model with keras function kera_model_sequantiall and embedded the recurrent neural network layers.

model <- keras_model_sequential()
model %>%
  layer_embedding(input_dim = 500, output_dim = 32) %>%
  layer_simple_rnn(units = 32) %>% 
  layer_dense(units = 1, activation = "sigmoid")

activation we used sigmoid function that is very useful for interpretation purposes.

Repeated measures of ANOVA in R

Compile Model

model %>% compile(optimizer = "rmsprop",
                  loss = "binary_crossentropy",
                  metrics = c("acc"))

Fit model

history <- model %>% fit(train_x, train_y,
                         epochs = 25,
                         batch_size = 128,
                         validation_split = 0.2)
plot(history)

validation_split indictes 20% of the dataset used for validation purposes.

The top one is for loss and the second one is for accuracy, now you can see validation dataset loss is increasing and accuracy is decreasing from a certain epoch onwards. So this because of overfitting.

Model Prediction

model %>% evaluate(train_x, train_y) 
loss       acc 
0.3644736 0.8765600 
pred <- model %>%   
predict_classes(train_x) 
table(Predicted=pred, Actual=imdb$train$y)   
Actual Predicted    
 0     1         
0 11503  2089         
1   997 10411 
model %>% evaluate(test_x, test_y) 
loss      acc 
1.032544 0.687720 
pred1 <- model %>%   
predict_classes(test_x) 
table(Predicted=pred1, Actual=imdb$test$y) 
Actual Predicted    
0    1         
0 9203 4510         
1 3297 7990

In the training dataset, we got 87% of accuracy and it falls into 68% in the test dataset.

So improvement required in the model for better prediction.

15 Essential packages in R

You can make some changes in the model

model %>%
  layer_embedding(input_dim = 500, output_dim = 32) %>%
  layer_simple_rnn(units = 32,return_sequences = TRUE,activation = 'relu') %>% 
  layer_simple_rnn(units = 32,return_sequences = TRUE,activation = 'relu') %>% 
  layer_simple_rnn(units = 32) %>% 
  layer_dense(units = 1, activation = "sigmoid")

In the above model instead of 1 layer, we used 3 layers, return sequences mentioned as TRUE and relu activation function used. Other changes we can do in padding. In the current model we used 90 instead of that we can find out the average customer review and the same can use for padding.

z<-NULL
for(i in 1:250000){z[i]<-print(length(train_x[[i]]))}
summary(z)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
   11.0   130.0   178.0   238.7   291.0  2494.0

Median is coming 178 and mean is 238 and we use some middle number for padding like 200.

Padding sequences

train_x <- pad_sequences(train_x, maxlen = 200)
test_x <- pad_sequences(test_x, maxlen = 200)

Rerun the model and check the accuracy again.

model %>% evaluate(train_x, train_y)
    loss       acc
0.3733827 0.8421200

The train dataset accuracy is 84% earlier it was 87%

model %>% evaluate(test_x, test_y)
    loss       acc
0.4351899 0.8114400

Test data set accuracy significantly improved from 68% to 81%.

Now you can check with simple LSTM model for better prediction

Naïve Bayes Classification in R

LSTM Network in R

model %>%
  layer_embedding(input_dim = 500, output_dim = 32) %>%
  layer_lstm(units = 32,return_sequences = TRUE) %>% 
  layer_lstm(units = 32,return_sequences = TRUE) %>%
  layer_lstm(units = 32) %>%
  layer_dense(units = 1, activation = "sigmoid")

When you are using LSTM model try the optimizer “adam” for better prediction.

Compile

model %>% compile(optimizer = "adam",
                  loss = "binary_crossentropy",
                  metrics = c("acc"))

Bidirectional LSTM Model

model %>%
layer_embedding(input_dim = 500, output_dim = 32) %>%
layer_lstm(units = 32,return_sequences = TRUE) %>%
layer_lstm(units = 32,return_sequences = TRUE) %>%
bidirectional(layer_lstm(units = 32)) %>%
layer_dense(units = 1, activation = "sigmoid")

Conclusion

The model accuracy improved in different steps we experimented with, instead of doing a simple LSTM model you can try for a bidirectional model for better prediction.

Deep Neural Network with R

The post LSTM Network in R appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Methods – finnstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.