Site icon R-bloggers

Sentiment Analysis in R with Custom Lexicon Dictionary using tidytext

[This article was first published on r-bloggers on Programming with R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this Sentiment Analysis tutorial, You’ll learn how to use your custom lexicon (for any language other than English) or keywords dictionary to perform simple (slightly naive) sentiment analysis using R’s tidytext package. Note: This isn’t going to provide you the same accuracy as using the language model, but it’s going to get you to the fastest solution (with some accuracy tradeoff). This example deals with Turkish Sentiment Analysis Script. Please note this tutorial doesn’t include Text Pre-processing steps, but those are very important for any Text Analytics / NLP project.

Video Walkthrough

Steps

  • Read the Input Text as a Dataframe
  • Load the lexicon / new language dictionary
  • Select the appropriate columns – in this case, word and polarity
  • Join the tokenized words from the text dataframe with the lexicon dataframe
  • Roll-up the result dataframe based on the grouping variable (row_number) to get sentence level aggregated sentiment score

Code

library(tidyverse)

#install.packages("tidytext")
library(tidytext)

sent <- read.csv('text.csv')

lexicon <- read.table("turkish_lexicon.csv",
                      header = TRUE,
                      sep = ';',
                      stringsAsFactors = FALSE)

lexicon2 <- lexicon %>% 
  select(c("WORD","POLARITY")) %>% 
  rename('word'="WORD",'value'="POLARITY")


sent %>%
  mutate(linenumber = row_number()) %>% #line number for later sentence grouping 
  unnest_tokens(word, tweettext) %>% #tokenization - sentence to words
  inner_join(lexicon2) %>% # inner join with our lexicon to get the polarity score
  group_by(linenumber) %>% #group by for sentence polarity
  summarise(sentiment = sum(value)) %>% # final sentence polarity from words
  left_join(
  sent %>%
  mutate(linenumber = row_number()) #get the actual text next to the sentiment value
) %>% write.csv("sentiment_output.csv",row.names = FALSE)

References

To leave a comment for the author, please follow the link and comment on their blog: r-bloggers on Programming with R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.