Site icon R-bloggers

Use R to connect to twitter and create a wordcloud of your tweets

Recently I wanted to create a wordcloud of my tweets and do further analysis. In this post I am going to show you how to connect to twitter in R and how to make a wordcloud from your tweets. To follow this tutorial, you need a Twitter account.

First steps in R

Install required libraries twitteR and wordcloud and load them.

?View Code RSPLUS
1
2
3
install.packages(c("wordcloud", "twitteR"))
library(twitteR)
library(wordcloud)

Create a twitter app

To be able to authenticate your API requests with the R package twitteR you need to authenticate yourself. To have an endpoint for that, you need to create a Twitter App at https://apps.twitter.com/. Click “Create New App” and fill the required fields with your values.

Here you set everything for your app.

When you successfully created your app, go to Keys and Access Tokens. There you find consumer key and consumer secret that you need to authenticate in R.

Here you get the consumer key and the consumer secret.

Authenticating and first steps with twitteR

Save the keys from your Twitter App.

?View Code RSPLUS
1
2
twitter_key<-"your_twitter_key"
twitter_secret<-"your_twitter_secret"
?View Code RSPLUS
1
oauth<-setup_twitter_oauth(twitter_key, twitter_secret)

After this, a browser will pop open which will ask you to login with your Twitter account (unless you are already logged in) and ask you to give permissions to yourAppName. When you correctly set the callback URL, the following text will appear:

This message is shown in the browser after successful authentication.

With the following command we get the 100 newest tweets of user “ExpectAPatronum” (which is me), but you can do it for other users as well. The second line will display the structure of the newest tweet.

?View Code RSPLUS
1
2
myTweets<-userTimeline("ExpectAPatronum", n=100)
str(myTweets[[1]])

A tweet contains lots of information (from statusSource we can even tell I sent it using the iPhone app!).

?View Code BASH
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Reference class 'status' [package "twitteR"] with 17 fields
 $ text         : chr "Don't agree with everything but still funny! https://t.co/2bMYBDkfGY"
 $ favorited    : logi FALSE
 $ favoriteCount: num 0
 $ replyToSN    : chr(0) 
 $ created      : POSIXct[1:1], format: "2016-01-18 07:21:31"
 $ truncated    : logi FALSE
 $ replyToSID   : chr(0) 
 $ id           : chr "688984546289790976"
 $ replyToUID   : chr(0) 
 $ statusSource : chr "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>"
 $ screenName   : chr "ExpectAPatronum"
 $ retweetCount : num 0
 $ isRetweet    : logi FALSE
 $ retweeted    : logi FALSE
 $ longitude    : chr(0) 
 $ latitude     : chr(0) 
 $ urls         :'data.frame':	1 obs. of  5 variables:
  ..$ url         : chr "https://t.co/2bMYBDkfGY"
  ..$ expanded_url: chr "https://twitter.com/jennybryan/status/688866722980364289"
  ..$ display_url : chr "twitter.com/jennybryan/sta…""| __truncated__
  ..$ start_index : num 45
  ..$ stop_index  : num 68
 and 53 methods, of which 39 are  possibly relevant:
   getCreated, getFavoriteCount, getFavorited, getId, getIsRetweet, getLatitude,
   getLongitude, getReplyToSID, getReplyToSN, getReplyToUID, getRetweetCount,
   getRetweeted, getRetweeters, getRetweets, getScreenName, getStatusSource, getText,
   getTruncated, getUrls, initialize, setCreated, setFavoriteCount, setFavorited, setId,
   setIsRetweet, setLatitude, setLongitude, setReplyToSID, setReplyToSN, setReplyToUID,
   setRetweetCount, setRetweeted, setScreenName, setStatusSource, setText, setTruncated,
   setUrls, toDataFrame, toDataFrame#twitterObj

Creating the wordcloud

With the following wordcloud I created the first wordcloud:

?View Code RSPLUS
1
2
3
4
5
6
set.seed(1234) # to always get the same wordcloud and for better reproducibility
tweetTexts<-unlist(lapply(myTweets, function(t) { t$text})) # to extract only the text of each status object
words<-unlist(strsplit(tweetTexts, " "))
words<-tolower(words)
clean_words<-words[-grep("http|@|#|ü|ä|ö", words)] # remove urls, usernames, hashtags and umlauts (the latter can not be displayed by all s)
wordcloud(clean_words, min.freq=2)
Without any specific settings.

Making it look nicer

Since I didn’t like the default and also not the ones suggested in the example section of the package, I started to look for other possible s. From the help I found out that everything can be passed as parameter v which is also accepted by the method text {graphics} because this parameter will be passed on to this method. This method accepts Hershey s (which contains 8 families with different faces like bold, italic, …).

Playing around with that a little I generated a few more wordclouds.

?View Code RSPLUS
1
2
3
wordcloud(clean_words, min.freq=2, v=c("serif", "plain"))
wordcloud(clean_words, min.freq=2, v=c("script", "plain"))
wordcloud(clean_words, min.freq=2, v=c("gothic italian", "plain"))
Font serif (plain).
Font script (plain).
Font gothic italian (plain).

One other important issue for a nice wordcloud is definitely also color. wordcloud uses the package RColorBrewer for that (which is automatically installed with wordcloud).

The package RColorBrewer provides several palettes of colors that look nice together. I chose the palette “Pastel1” with 7 colors (minimum is 3, maximum depends on the palette). Of course you can use par to change other settings of the plot.

?View Code RSPLUS
1
2
3
pal<-brewer.pal(7, "Pastel1")
par(bg="darkgray")
wordcloud(clean_words, min.freq=2, v=c("script", "plain"), colors=pal)
Font script (plain), gray background and color palette Pastel1.

Other settings

As already seen, you can change the (v) and the color (colors) of the wordcloud. There are a lot more settings in wordcloud:

Source code

?Download twitter_wordcloud.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
library(wordcloud)
library(twitteR)
 
install.packages("extra")
library(extra)
_import()
 
twitter_key<-"your_key"
twitter_secret<-"your_secret"
 
oauth<-setup_twitter_oauth(twitter_key, twitter_secret)
myTweets<-userTimeline("ExpectAPatronum", n=100)
str(myTweets[[1]])
 
tweetTexts<-unlist(lapply(myTweets, function(t) { t$text}))
 
#### wordcloud
 
set.seed(1234)
words<-unlist(strsplit(tweetTexts, " "))
words<-tolower(words)
 
length(grep("http", words))
length(grep("@", words))
length(grep("#", words))
 
clean_words<-words[-grep("http|@|#|ü|ä|ö", words)]
wordcloud(clean_words, min.freq=2)
 
#### playing with the settings 
 
wordcloud(clean_words, min.freq=2, v=c("serif", "plain"))
wordcloud(clean_words, min.freq=2, v=c("script", "plain"))
wordcloud(clean_words, min.freq=2, v=c("gothic italian", "plain"))
 
 
pal<-brewer.pal(7, "Pastel1")
par(bg="darkgray")
wordcloud(clean_words, min.freq=2, v=c("script", "plain"), colors=pal)
 
#### feature image
 
pal<-brewer.pal(7, "Dark2")
par(bg="lightgray")
wordcloud(clean_words, min.freq=2, v=c("script", "plain"), colors=pal)

The post Use R to connect to twitter and create a wordcloud of your tweets appeared first on verenahaunschmid.