Site icon R-bloggers

How to Get Twitter Data using R

[This article was first published on R – Predictive Hacks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In a previous post, we showed how to get Twitter data using Python. In this tutorial, we will show you how to get Twitter data using R and more particularly with the rtweet library. As we have explained in the previous post, you will need to create a developer account and get your consumer and access keys respectively.

Install the rtweet Library and API Authorization

We can install the rtweet library either from CRAN or from GitHub as follows:

## install rtweet from CRAN
install.packages("rtweet")


## OR
## install remotes package if it's not already
if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}

## install dev version of rtweet from github
remotes::install_github("ropensci/rtweet")

Then we are ready to load the library:

## load rtweet package
library(rtweet)

We can authorize the API by entering our credentials:

## load rtweet
library(rtweet)

## load the tidyverse
library(tidyverse)

## store api keys (these are fake example values; replace with your own keys)
api_key <- "afYS4vbIlPAj096E60c4W1fiK"
api_secret_key <- "bI91kqnqFoNCrZFbsjAWHD4gJ91LQAhdCJXCj3yscfuULtNkuu"

## authenticate via web browser
token <- create_token(
  app = "rstatsjournalismresearch",
  consumer_key = api_key,
  consumer_secret = api_secret_key)

get_token()
 

Search Tweets

Now we are ready to get our first data from Twitter, starting with 1000 tweets containing the hashtag “#DataScience” by excluding the re-tweets. Note that the “search_tweets” returns data from the last 6-9 days

rt <- search_tweets("#DataScience", n = 1000, include_rts = FALSE)
View(rt)
 

Usually, we want to get the screen name, the text, the number of likes and the number of re-tweets.

View(rt%>%select(screen_name, text, favorite_count, retweet_count))
 

The rt object consists of 90 columns, so as you can understand it contains a lot of information. Note that there is a limit of 18000 tweets per 15 minutes. If we want to get more, we should do the following:

## search for 250,000 tweets containing the word datascience
rt <- search_tweets(
  "datascience", n = 250000, retryonratelimit = TRUE
)
 

We can also plot the tweets:

ts_plot(rt, "hour") 
 

Stream Tweets

Stream tweets return a random sample of approximately 1% of the live stream of all tweets, having the option to filter the data via a search-like query, or enter specific user ids, or define the geo-coordinates of the tweets.

## stream tweets containing the toke DataScience for 30 seconds
rt <- stream_tweets("DataScience", timeout = 30)
View(rt)
 

Note that the rt object consists of 90 columns.

Filter Tweets

When we search for tweets, we can also apply some filters. For example, let’s say that we want to specify the language or to exclude retweets and so on. Let’s search for tweets that contain the “StandWithUkraine” text by excluding retweets, quotes and replies.

ukraine <- search_tweets("StandWithUkraine-filter:retweets -filter:quote -filter:replies", n = 100)
View(ukraine)
 

Finally, let’s search for tweets in English that contain the word “Putin” and have more than 100 likes and retweets.

putin<-search_tweets("Putin min_retweets:100 AND min_faves:100", lang='en')
View(putin)
 

Get Timelines

The get_imeline() function returns up to 3,200 statuses posted to the timelines of each of one or more specified Twitter users. Let’s get the timeline of R-bloggers and Hadley Wickham accounts by taking into consideration the 500 most recent tweets of each one.

rt<-get_timeline(user=c("Rbloggers", "hadleywickham"), n=500)
 

Whose tweets are more popular? Who gets more likes and retweets? Let’s answer this question.

rt%>%group_by(screen_name)%>%
     summarise(AvgLikes = mean(favorite_count, na.rm=TRUE), AvgRetweets = mean(retweet_count, na.rm=TRUE))
 

As we can see, the guru Hadley Wickham gets much more likes and retweets compared to R-bloggers.

# A tibble: 2 x 3
  screen_name   AvgLikes AvgRetweets
1 hadleywickham     31.6     110.  
2 Rbloggers         17.8     6.17

Get the Likes made by a User

We can get the n most recently liked tweets by a user. Let’s see what Hadley likes!

hadley <- get_favorites("hadleywickham", n = 1000)
 

Which are his favorite accounts?

hadley%>%group_by(screen_name)%>%count()%>%arrange(desc(n))%>%head(5)
 

And we get:

# A tibble: 5 x 2
# Groups:   screen_name [5]
  screen_name       n
1 djnavarro        17
2 ijeamaka_a       13
3 mjskay           12
4 sharlagelfand    12
5 vboykis          12

Which are his favorite hashtags?

hadley%>%unnest(hashtags)%>%group_by(tolower(hashtags))%>%count()%>%
      arrange(desc(n))%>%na.omit()%>%head(5)
 

And we get:

# A tibble: 5 x 2
# Groups:   tolower(hashtags) [5]
  `tolower(hashtags)`     n
1 rstats                 78
2 rtistry                22
3 genuary2022             9
4 genuary                 7
5 rayrender               6

Search Users

We can search for users with a specific hashtag. For example:

## search for up to 1000 users using the keyword rstats
rstats <- search_users(q = "rstats", n = 1000)

## data frame where each observation (row) is a different user
rstats

## tweets data also retrieved. can access it via tweets_data()
tweets_data(rstats)
 

Get Friends and Followers

We can get the user ids of the friends and followers of a specific account. Let’s get the friends and followers of the Predictive Hacks account.

get_followers("predictivehacks")
get_friends("predictivehacks")
 

We can extract more info on the user ids using the function:

lookup_users(users, parse = TRUE, token = NULL)
 

where users is a user id or screen name of the target user.

Get Trends

We can get the trends in a particular city or country. For example:

## Retrieve available trends
trends <- trends_available()
trends

## Store WOEID for Worldwide trends
worldwide <- trends$woeid[grep("world", trends$name, ignore.case = TRUE)[1]]

## Retrieve worldwide trends datadata
ww_trends <- get_trends(worldwide)

## Preview trends data
ww_trends

## Retrieve trends data using latitude, longitude near New York City
nyc_trends <- get_trends_closest(lat = 40.7, lng = -74.0)

## should be same result if lat/long supplied as first argument
nyc_trends <- get_trends_closest(c(40.7, -74.0))

## Preview trends data
nyc_trends

## Provide a city or location name using a regular expression string to
## have the function internals do the WOEID lookup/matching for you
(luk <- get_trends("london"))
 

Let’s see what is trending in Russia and in Ukraine, respectively.

get_trends("Russia")
get_trends("Ukraine")
 

Final Thoughts

Every Data Science analysis starts with the data. In this tutorial, we showed how to get Twitter data in R. In the next posts, we will show you how to analyze this data by applying sentiment analysis, topic modelling, network analysis and so on. Stay tuned!

To leave a comment for the author, please follow the link and comment on their blog: R – Predictive Hacks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.