Recently I wanted to create a wordcloud of my tweets and do further analysis. In this post I am going to show you how to connect to twitter in R and how to make a wordcloud from your tweets. To follow this tutorial, you need a Twitter account.
First steps in R
Install required libraries twitteR and wordcloud and load them.
1 2 3 | install.packages(c("wordcloud", "twitteR")) library(twitteR) library(wordcloud) |
Create a twitter app
To be able to authenticate your API requests with the R package twitteR you need to authenticate yourself. To have an endpoint for that, you need to create a Twitter App at https://apps.twitter.com/. Click “Create New App” and fill the required fields with your values.
- Name: choose a name for your app, unfortunately it has to be unique. Most combinations of R and Twitter I could think of were already taken, so I just took veRenaTweeteR
- Description: Some description.
- Website: They want you to provide a website URL e.g. where your app can be downloaded. Since I don’t plan to “publish” my app in anyway I just put my blog address.
- Callback URL: You have to put http://127.0.0.1:1410 to be redirected after authentication.
When you successfully created your app, go to Keys and Access Tokens. There you find consumer key and consumer secret that you need to authenticate in R.
Authenticating and first steps with twitteR
Save the keys from your Twitter App.
1 2 | twitter_key<-"your_twitter_key" twitter_secret<-"your_twitter_secret" |
1 | oauth<-setup_twitter_oauth(twitter_key, twitter_secret) |
After this, a browser will pop open which will ask you to login with your Twitter account (unless you are already logged in) and ask you to give permissions to yourAppName. When you correctly set the callback URL, the following text will appear:
With the following command we get the 100 newest tweets of user “ExpectAPatronum” (which is me), but you can do it for other users as well. The second line will display the structure of the newest tweet.
1 2 | myTweets<-userTimeline("ExpectAPatronum", n=100) str(myTweets[[1]]) |
A tweet contains lots of information (from statusSource we can even tell I sent it using the iPhone app!).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | Reference class 'status' [package "twitteR"] with 17 fields $ text : chr "Don't agree with everything but still funny! https://t.co/2bMYBDkfGY" $ favorited : logi FALSE $ favoriteCount: num 0 $ replyToSN : chr(0) $ created : POSIXct[1:1], format: "2016-01-18 07:21:31" $ truncated : logi FALSE $ replyToSID : chr(0) $ id : chr "688984546289790976" $ replyToUID : chr(0) $ statusSource : chr "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>" $ screenName : chr "ExpectAPatronum" $ retweetCount : num 0 $ isRetweet : logi FALSE $ retweeted : logi FALSE $ longitude : chr(0) $ latitude : chr(0) $ urls :'data.frame': 1 obs. of 5 variables: ..$ url : chr "https://t.co/2bMYBDkfGY" ..$ expanded_url: chr "https://twitter.com/jennybryan/status/688866722980364289" ..$ display_url : chr "twitter.com/jennybryan/sta…""| __truncated__ ..$ start_index : num 45 ..$ stop_index : num 68 and 53 methods, of which 39 are possibly relevant: getCreated, getFavoriteCount, getFavorited, getId, getIsRetweet, getLatitude, getLongitude, getReplyToSID, getReplyToSN, getReplyToUID, getRetweetCount, getRetweeted, getRetweeters, getRetweets, getScreenName, getStatusSource, getText, getTruncated, getUrls, initialize, setCreated, setFavoriteCount, setFavorited, setId, setIsRetweet, setLatitude, setLongitude, setReplyToSID, setReplyToSN, setReplyToUID, setRetweetCount, setRetweeted, setScreenName, setStatusSource, setText, setTruncated, setUrls, toDataFrame, toDataFrame#twitterObj |
Creating the wordcloud
With the following wordcloud I created the first wordcloud:
1 2 3 4 5 6 | set.seed(1234) # to always get the same wordcloud and for better reproducibility tweetTexts<-unlist(lapply(myTweets, function(t) { t$text})) # to extract only the text of each status object words<-unlist(strsplit(tweetTexts, " ")) words<-tolower(words) clean_words<-words[-grep("http|@|#|ü|ä|ö", words)] # remove urls, usernames, hashtags and umlauts (the latter can not be displayed by all s) wordcloud(clean_words, min.freq=2) |
Making it look nicer
Since I didn’t like the default and also not the ones suggested in the example section of the package, I started to look for other possible s. From the help I found out that everything can be passed as parameter v which is also accepted by the method text {graphics} because this parameter will be passed on to this method. This method accepts Hershey s (which contains 8 families with different faces like bold, italic, …).
Playing around with that a little I generated a few more wordclouds.
1 2 3 | wordcloud(clean_words, min.freq=2, v=c("serif", "plain")) wordcloud(clean_words, min.freq=2, v=c("script", "plain")) wordcloud(clean_words, min.freq=2, v=c("gothic italian", "plain")) |
One other important issue for a nice wordcloud is definitely also color. wordcloud uses the package RColorBrewer for that (which is automatically installed with wordcloud).
The package RColorBrewer provides several palettes of colors that look nice together. I chose the palette “Pastel1” with 7 colors (minimum is 3, maximum depends on the palette). Of course you can use par to change other settings of the plot.
1 2 3 | pal<-brewer.pal(7, "Pastel1") par(bg="darkgray") wordcloud(clean_words, min.freq=2, v=c("script", "plain"), colors=pal) |
Other settings
As already seen, you can change the (v) and the color (colors) of the wordcloud. There are a lot more settings in wordcloud:
- words
- freq
- scale (=4,.5): range of the size of the words
- min.freq (=3): the minimum frequency of a word to be included. I always set it to at least 2.
- max.words (=Inf): maximum number of words in the wordcloud
- random.order (=TRUE): otherwise words are plotted in decreasing frequency
- random.color (=FALSE)
- rot.per (=.1): how many words are 90 degree rotated
- colors (= “black”)
- ordered.colors (= FALSE)
- use.r.layout (=FALSE)
- fixed.asp (=TRUE)
- …: any parameter that can be passed to text (e.g. v)
Source code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | library(wordcloud) library(twitteR) install.packages("extra") library(extra) _import() twitter_key<-"your_key" twitter_secret<-"your_secret" oauth<-setup_twitter_oauth(twitter_key, twitter_secret) myTweets<-userTimeline("ExpectAPatronum", n=100) str(myTweets[[1]]) tweetTexts<-unlist(lapply(myTweets, function(t) { t$text})) #### wordcloud set.seed(1234) words<-unlist(strsplit(tweetTexts, " ")) words<-tolower(words) length(grep("http", words)) length(grep("@", words)) length(grep("#", words)) clean_words<-words[-grep("http|@|#|ü|ä|ö", words)] wordcloud(clean_words, min.freq=2) #### playing with the settings wordcloud(clean_words, min.freq=2, v=c("serif", "plain")) wordcloud(clean_words, min.freq=2, v=c("script", "plain")) wordcloud(clean_words, min.freq=2, v=c("gothic italian", "plain")) pal<-brewer.pal(7, "Pastel1") par(bg="darkgray") wordcloud(clean_words, min.freq=2, v=c("script", "plain"), colors=pal) #### feature image pal<-brewer.pal(7, "Dark2") par(bg="lightgray") wordcloud(clean_words, min.freq=2, v=c("script", "plain"), colors=pal) |
The post Use R to connect to twitter and create a wordcloud of your tweets appeared first on verenahaunschmid.