Analysing comments to “Star Wars: The Last Jedi” – part 2
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
As already mentioned in my first post I also analysed the user comments from a post at www.starwars-union.de word by word. The figure shows the ‘wordcloud’ from all comments (1728 until now).
To create such a nice wordcloud, I used the following code. The first part was already explained in my first post.
library(tidyverse)
library(rvest)
site <- seq(0, 1710, 30)
url <- paste0("https://www.starwars-union.de/nachrichten/18973/SWU-Kritiken-Unsere-Gedanken-zu-Star-Wars-Die-letzten-Jedi/k/",site,"/#kommentare")
First I load all neccesarry packages and I create all available URLs to the comments.
comments <- lapply(1:length(url), function(x) {
data <- read_html(url[x]) %>%
html_nodes(xpath = '//*[@id="kommentargesamt"]') %>%
html_nodes("#kommentar") %>%
html_nodes("p") %>%
html_text()
data[seq(2, length(data), 2)]
})
This is the main part for scraping all the comments: I searched the HTML file for id=”kommentargesamt” and extract the comments. These are saved in the variable comments.
Now all is prepared for creating the wordcloud. For that purpose I used the following snippet, which I found once in the internet. There are many examples creating a wordcloud with R and I decided to use the following one:
library(stringr)
library(tm)
library(SnowballC)
library(wordcloud)
library(RColorBrewer)
words <- unlist(str_split(comments, pattern = c(" ")))
Corpus <- Corpus(VectorSource(words)) %>%
tm_map(content_transformer(tolower)) %>%
tm_map(removePunctuation) %>%
tm_map(removeWords, c("dass", "zuletzt", "geändert", "am", "uhr",
stopwords('german')))
To create a nice graphical output I recommend to save the wordcloud directly and not via RStudio viewer or something else.
png(
filename = "SWU_comments_wordcloud.png",
width = 500,
height = 500)
wordcloud(Corpus,
scale = c(8,.2),
min.freq = 2,
max.words = 50,
random.order = FALSE,
rot.per = .15,
colors = brewer.pal(8,"Dark2"))
dev.off()
And that’s it !! I think most of the words are comprehensible also for non-german readers 😉
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.