Site icon R-bloggers

Sentiment analysis and Parts of Speech tagging in Dutch/French/English/German/Spanish/Italian

[This article was first published on BNOSAC - Belgium Network of Open Source Analytical Consultants, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

As part of our continuing effort to digitise poetry and to automate new forms of poetry, we released an R package called pattern.nlp, which is available at https://github.com/bnosac/pattern.nlp . It allows R users to do sentiment analysis and Parts of Speech tagging for text written in Dutch, French, English, German, Spanish or Italian. Of course this can also be used for other purposes like data preparation as part of a topic modelling flow.

If you are interested in text mining, feel free to register for the text mining courses listed at our last blog post.

If you just want to do sentiment analysis and POS tagging in these 5 European languages, go ahead as follows. Sentiment analysis is available for Dutch, French & English.

library(pattern.nlp)

## Sentiment analysis
x <- pattern_sentiment("i really really hate iphones", language = "english")
y <- pattern_sentiment("de wereld is een mooie plaats, nietwaar sherlock", language = "dutch")
z <- pattern_sentiment("j'aime Paris, c'est super", language = "french")
rbind(x, y, z)

polarity subjectivity                                               id
   -0.80         0.90                     i really really hate iphones
    0.70         1.00 de wereld is een mooie plaats, nietwaar sherlock
    0.65         0.75                        j'aime Paris, c'est super

Parts of Speech tagging is available for Dutch, French, English, Spanish & Italian.

library(pattern.nlp)

x <- "Il pleure dans mon coeur comme il pleut sur la ville. Quelle est cette langueur qui penetre mon coeur?"
pattern_pos(x = x, language = 'french')

x <- "Avevamo vegliato tutta la notte - i miei amici ed io sotto lampade 
  di moschea dalle cupole di ottone traforato, stellate come le nostre anime, 
  perché come queste irradiate dal chiuso fulgòre di un cuore elettrico."
pattern_pos(x = x, language = 'italian')

 

We are also working on a Dutch wordnet – which will be fully released in due date. More information at https://github.com/weRbelgium/wordnet.dutch.
Hope you use the package for spreading new languages!

To leave a comment for the author, please follow the link and comment on their blog: BNOSAC - Belgium Network of Open Source Analytical Consultants.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.