Sentiment analysis in R

Posted on May 16, 2021 by finnstats in R bloggers | 0 Comments

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Sentiment analysis in R, In this article, we will discuss sentiment analysis using R. We will make use of the syuzhet text package to analyze the data and get scores for the corresponding words that are present in the dataset.

The ultimate aim is to build a sentiment analysis model and identify the words whether they are positive, negative, and also the magnitude of it.

In this article codes are mainly divided into loading data, build a corpus, cleansing text, create term-document matrix, visualization, and sentiment analysis.

Class imbalance in R

Sentiment analysis in R

The following main packages are used in this article

tm for text mining operations like removing numbers, special characters, punctuations and stop words (Stop words in any language are the most commonly occurring words that have very little value for NLP and should be filtered out
word cloud for generating the word cloud plot.
syuzhet for sentiment scores and emotion classification
ggplot2 for plotting graphs

What is Sentiment Analysis?

Sentiment Analysis is a process of extracting opinions that have different scores like positive, negative or neutral.

Based on sentiment analysis, you can find out the nature of opinion or sentences in text.

Sentiment Analysis is a type of classification where the data is classified into different classes like positive or negative or happy, sad, angry, etc.

Data Reshapes in R

Getting data

apple <- read.csv("D:/RStudio/SentimentAnalysis/Data1.csv", header = T)
str(apple)

This dataset contains 1000 observations and 16 variables but we are interested only in one column that is ‘text’.

data.frame': 1000 obs. of  16 variables:
 $ text         : chr  "RT @option_snipper: $AAPL beat on both eps and revenues. SEES 4Q REV. $49B-$52B, EST. $49.1B https://t.co/hfHXqj0IOB" "RT @option_snipper: $AAPL beat on both eps and revenues. SEES 4Q REV. $49B-$52B, EST. $49.1B https://t.co/hfHXqj0IOB" "Let's see this break all timers. $AAPL 156.89" "RT @SylvaCap: Things might get ugly for $aapl with the iphone delay. With $aapl down that means almost all of t"| __truncated__ ... $ favorited    : logi  FALSE FALSE FALSE FALSE FALSE FALSE ... $ favoriteCount: int  0 0 0 0 0 0 0 0 0 0 ...
 $ replyToSN    : chr  NA NA NA NA ... $ created      : chr  "2017-08-01 20:31:56" "2017-08-01 20:31:55" "2017-08-01 20:31:55" "2017-08-01 20:31:55" ...
 $ truncated    : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
 $ replyToSID   : num  NA NA NA NA NA NA NA NA NA NA ...
 $ id           : num  8.92e+17 8.92e+17 8.92e+17 8.92e+17 8.92e+17 ...
 $ replyToUID   : num  NA NA NA NA NA NA NA NA NA NA ..

Build corpus

Once we loaded the dataset in R, the next step is to load that Vector or text data as a Corpus. We can execute the same based on tm package in R.

Proportion test in R

library(tm)
corpus <- iconv(apple$text)
corpus <- Corpus(VectorSource(corpus))
inspect(corpus[1:5])