Site icon R-bloggers

Pulling Twitter Engagements Using the v2 API as Well as rtweet

[This article was first published on rstats on Bryan Shalloway's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is a follow-up to a short post I wrote on R Access to Twitter’s v2 API. In this post I’ll walk through a few more examples of pulling data from twitter using a mix of Twitter’s v2 API as well as the {rtweet} package1.

I’ll pull all Twitter users that I ([@brshallo](https://twitter.com/brshallo)) have recently been engaged by or engaged with. I’ll use a mix of {rtweet} and {httr} to collect recent engagements. For each type of engagement I’ll lean towards using {rtweet}2. I’ll use {httr} in cases where it’s more convenient to use Twitter’s v2 API3.

In this post I’m not really worried about optimizing my queries, minimizing API hits, etc. E.g. when using {rtweet} I should authenticate through my project app which has higher rate limits (see Authentication options) but instead I just use the default {rtweet} user authentication. Note also that the default {rtweet} authentication only works when running scripts interactively4.

See prior post for links on authentication mechanisms. I’m assuming you have “TWITTER_BEARER” in your .Renviron file (for the sections where I use {httr} in this post) as well as the default {rtweet} authentication set-up.

library(rjson)
require(httr)
require(jsonlite)
require(dplyr)
library(purrr)
library(lubridate)
library(rtweet)
library(tidyr)

# bearer_token only used when using httr and twitter v2 API
bearer_token <- Sys.getenv("TWITTER_BEARER")
headers <- c(`Authorization` = sprintf('Bearer %s', bearer_token))

GETting all engagements

In each sub-section I’ll pull a different kind of engagement.

  1. GET favorited users
  2. GET all tweets from user – starting point for most of the following sections
  3. From initial query [GET references] in those tweets
  4. Filter to only tweets with likes, GET favoriters
  5. Filter to only tweets with quotes, search URL’s to GET quoters
  6. Filter to only tweets with retweets, GET retweeters
  7. GET repliers and mentions

I’ll finish by Putting them together into a function. Note that not all queries are perfect at pulling each engagements5.

GET favorited users

It’s often easiest to just let {rtweet} do the work.

# Twitter id for brshallo
user_id <- "307012324"

favorites <- rtweet::get_favorites(user = user_id)

GET all tweets from user

Pulls up to 100 of the most recent tweets from a user6.

url_handle <- glue::glue("https://api.twitter.com/2/users/{user_id}/tweets?max_results=100", user_id = user_id)

params <- list(tweet.fields = "public_metrics,created_at,in_reply_to_user_id,referenced_tweets")

response <- httr::GET(url = url_handle,
                     httr::add_headers(.headers = headers),
                     query = params)

obj <- httr::content(response, as = "text")

json_data <- jsonlite::fromJSON(obj, flatten = TRUE)$data %>% 
  as_tibble()

GET users references

statuses_referenced <- bind_rows(json_data$referenced_tweets) %>% 
  rename(status_id = id)

users_referenced <- rtweet::lookup_tweets(statuses_referenced$status_id)

GET favoriters

Filter initial query of tweets to only those with more than 0 likes.

liked_tweets <- json_data %>% 
  filter(public_metrics.like_count > 0)

Functionalize approach described in getting favoriters from prior post R Access to Twitter’s v2 API and map tweet-ids through.

tweet_ids <- liked_tweets$id

get_favoriters <- function(tweet_id){
  url_handle <- glue::glue("https://api.twitter.com/2/tweets/{status_id}/liking_users", status_id = tweet_id)
  
  response <- httr::GET(url = url_handle,
                       httr::add_headers(.headers = headers))
                       # query = params)
  
  obj <- httr::content(response, as = "text")
  x <- rjson::fromJSON(obj)
  
  x$data %>% 
    map_dfr(as_tibble)
}

tweet_favoriters <-
  map_dfr(tweet_ids, ~ bind_cols(tibble(liked_status_id = .x),
                                get_favoriters(.x))) %>%
  rename(user_id = id)

GET quoters

Filter to only posts with quotes.

tweet_ids_quoters <- json_data %>% 
  filter(public_metrics.quote_count > 0) %>%
  pull(id)

Again, if you can use {rtweet} it’s generally easier to do this. However I am not positive the approach below actually picks up all quotes7. I’d also reviewed some other approaches[^other approaches].

search_tweets_urls <- function(tweet_id){
  rtweet::search_tweets(
    glue::glue("url:{tweet_id}", 
               tweet_id = tweet_id)
    )
} 

quoters <- map_dfr(tweet_ids_quoters, search_tweets_urls) %>% 
  filter(is_quote) %>% 
  as_tibble()
[^other approaches]:
This also seems to be way to see quoters: https://twittercommunity.com/t/how-we-can-get-list-of-replies-on-a-tweet-or-reply-to-a-tweet-in-twitter-api/144958/7


```r
get_quoters <- function(tweet_id){
  url_handle <- glue::glue("https://api.twitter.com/2/tweets/search/recent?tweet.fields=author_id&query=url:{status_id}", status_id = tweet_id)
  
  response <- httr::GET(url = url_handle,
                       httr::add_headers(.headers = headers))
                       # query = params)
  
  obj <- httr::content(response, as = "text")
  x <- rjson::fromJSON(obj)
  
  x$data %>% 
    map_dfr(as_tibble)
}

quoters <- map(tweet_ids_quoters, get_quoters)
```

GET retweeters

Filter to only posts that were retweeted.

tweet_ids_rt <- json_data %>% 
  filter(public_metrics.retweet_count > 0) %>%
  select(status_id = id)

I use a slightly different approach in this section than in other similar sections8.

retweeters <- tweet_ids_rt %>% 
  mutate(retweeters = map(status_id, get_retweeters)) %>% 
  unnest(retweeters)

GET repliers and mentions

Alternatively you might just use rtweet::get_mentions()` but this only pulls mentions of the currently authenticated user. I also tried some other approaches here[^reply-approaches].

get_mentions_v2 <- function(user_id){
  url_handle <- glue::glue("https://api.twitter.com/2/users/{user_id}/mentions", user_id = user_id)
  
  response <- httr::GET(url = url_handle,
                        httr::add_headers(.headers = headers))
  
  obj <- httr::content(response, as = "text")
  x <- rjson::fromJSON(obj)
  
  x$data %>% 
    map_dfr(as_tibble)
}

tweets_mentions <- get_mentions_v2(gorthon_id)

repliers_mentions <- lookup_tweets(mentions$id)
[^reply-approaches]: 
Another simple approach would be to just try: `rtweet::search_tweets("@brshallo")` . I tried the approach below, but really didnt' seem to work quite as expected...

```r
tweet_ids_repliers <- json_data %>% 
  filter(public_metrics.reply_count > 0) %>%
  pull(id)

# pulled from here: https://twittercommunity.com/t/how-to-fetch-retweets-and-quote-tweets-from-the-twitter-v2-search-api/156573 but didn't really work as expected...
get_replies <- function(tweet_id){
url_handle <- glue::glue("https://api.twitter.com/2/tweets/search/recent?tweet.fields=author_id&query=conversation_id:{status_id}", status_id = tweet_id)

response <- httr::GET(url = url_handle,
                     httr::add_headers(.headers = headers))
                     # query = params)

obj <- httr::content(response, as = "text")
x <- rjson::fromJSON(obj)

x$data %>% 
  map_dfr(as_tibble)
}

repliers <- map(tweet_ids_repliers, get_replies)
filter(is_quote)

repliers <- bind_rows(repliers)
```

Putting them together into a function

The function at this gist returns the output from each of the above sections as a list.

# Twitter id for brshallo
user_id <- "307012324"

# load function get_engagements()
source("https://gist.githubusercontent.com/brshallo/119d6a1f858e0e5c20d77212dee8891a/raw/751d022c7bc2e2148292bb78a5178737d9914024/get-engagements.R")

brshallo_engagements <- get_engagements(user_id)

brshallo_engagements
## $favorites
## # A tibble: 10 x 91
##    user_id             status_id  created_at          screen_name text    source
##  * <chr>               <chr>      <dttm>              <chr>       <chr>   <chr> 
##  1 248350998           151302361~ 2022-04-10 05:18:34 BuildABarr  "Drop ~ Twitt~
##  2 368551889           151263551~ 2022-04-09 03:36:23 IsabellaGh~ "@elli~ Twitt~
##  3 1469531055736590337 151242047~ 2022-04-08 13:21:54 emkayco     "Have ~ Twitt~
##  4 35794978            151196918~ 2022-04-07 07:28:38 _lionelhen~ "@brsh~ Twitt~
##  5 29916355            151194957~ 2022-04-07 06:10:44 jimjam_slam "@brsh~ Twitt~
##  6 29916355            151195192~ 2022-04-07 06:20:03 jimjam_slam "@brsh~ Twitt~
##  7 29916355            151189984~ 2022-04-07 02:53:06 jimjam_slam "@mdne~ Twitt~
##  8 3089027769          151189179~ 2022-04-07 02:21:09 gyp_casino  "@mdne~ Twitt~
##  9 15772978            151132777~ 2022-04-05 12:59:55 jessicagar~ "@brsh~ Twitt~
## 10 144592995           151129000~ 2022-04-05 10:29:49 Rbloggers   "R Acc~ r-blo~
## # ... with 85 more variables: display_text_width <dbl>,
## #   reply_to_status_id <chr>, reply_to_user_id <chr>,
## #   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## #   favorite_count <int>, retweet_count <int>, quote_count <int>,
## #   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## #   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## #   media_t.co <list>, media_expanded_url <list>, media_type <list>, ...
## 
## $favoriters
## # A tibble: 90 x 4
##    liked_status_id     user_id            name           username       
##    <chr>               <chr>              <chr>          <chr>          
##  1 1512295676004093955 117241741          Brett J. Gall  brettjgall     
##  2 1512295676004093955 2724597409         Peter Ellis    ellis2013nz    
##  3 1512294950905409543 274123666          Kristen Downs  KristenDDowns  
##  4 1512293864517750790 3656879234         <U+5F20><U+4EAE>           psychelzh      
##  5 1512293864517750790 703843771419484160 Ayush Patel    ayushbipinpatel
##  6 1512293864517750790 419185498          Kevin Gilds    Kevin_Gilds    
##  7 1512293864517750790 127357236          Juan LB        Juan_FLB       
##  8 1512293864517750790 49451947           Luis Remiro    LuisMRemiro    
##  9 1512293864517750790 253175044          Nicholas Viau  nicholasviau   
## 10 1512293864517750790 2202983986         Stefania Klayn Ettti_20       
## # ... with 80 more rows
## 
## $references
## # A tibble: 12 x 90
##    user_id            status_id  created_at          screen_name  text    source
##    <chr>              <chr>      <dttm>              <chr>        <chr>   <chr> 
##  1 307012324          151115943~ 2022-04-05 01:50:59 brshallo     "As an~ Twitt~
##  2 307012324          151229344~ 2022-04-08 04:57:09 brshallo     "@mdne~ Twitt~
##  3 307012324          150969487~ 2022-04-01 00:51:20 brshallo     "It al~ Twitt~
##  4 307012324          151229386~ 2022-04-08 04:58:49 brshallo     "@mdne~ Twitt~
##  5 307012324          147233714~ 2021-12-18 22:45:04 brshallo     "First~ Twitt~
##  6 29916355           151189984~ 2022-04-07 02:53:06 jimjam_slam  "@mdne~ Twitt~
##  7 29916355           151194957~ 2022-04-07 06:10:44 jimjam_slam  "@brsh~ Twitt~
##  8 144592995          151129000~ 2022-04-05 10:29:49 Rbloggers    "R Acc~ r-blo~
##  9 248350998          151302361~ 2022-04-10 05:18:34 BuildABarr   "Drop ~ Twitt~
## 10 3146735425         151226195~ 2022-04-08 02:52:00 mdneuzerling "Lovel~ Twitt~
## 11 983470194982088704 151182189~ 2022-04-06 21:43:22 R4DScommuni~ "The n~ Zapie~
## 12 2724597409         151226515~ 2022-04-08 03:04:44 ellis2013nz  "@mdne~ Twitt~
## # ... with 84 more variables: display_text_width <dbl>,
## #   reply_to_status_id <chr>, reply_to_user_id <chr>,
## #   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## #   favorite_count <int>, retweet_count <int>, quote_count <int>,
## #   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## #   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## #   media_t.co <list>, media_expanded_url <list>, media_type <list>, ...
## 
## $quoters
## NULL
## 
## $retweeters
## # A tibble: 11 x 2
##    status_id           user_id            
##    <chr>               <chr>              
##  1 1512293864517750790 296222670          
##  2 1512293864517750790 307012324          
##  3 1511869112401596423 4034079677         
##  4 1511869112401596423 1306626901432324097
##  5 1511869112401596423 1011817655957893120
##  6 1511469730892156928 1011817655957893120
##  7 1511469730892156928 1306626901432324097
##  8 1511159434717761539 1448348827979747333
##  9 1511159434717761539 15772978           
## 10 1511159434717761539 1011817655957893120
## 11 1511159434717761539 1306626901432324097
## 
## $referencers
## # A tibble: 10 x 90
##    user_id             status_id  created_at          screen_name text    source
##    <chr>               <chr>      <dttm>              <chr>       <chr>   <chr> 
##  1 61542689            150992063~ 2022-04-01 15:48:26 twelvespot  "@brsh~ Twitt~
##  2 61542689            150994022~ 2022-04-01 17:06:17 twelvespot  "@brsh~ Twitt~
##  3 18433005            151007180~ 2022-04-02 01:49:09 rcrdleitao  "@brsh~ Twitt~
##  4 35794978            151196918~ 2022-04-07 07:28:38 _lionelhen~ "@brsh~ Twitt~
##  5 1346474633520824320 150985661~ 2022-04-01 11:34:03 markjrieke  "@brsh~ Twitt~
##  6 29916355            151195192~ 2022-04-07 06:20:03 jimjam_slam "@brsh~ Twitt~
##  7 29916355            151195162~ 2022-04-07 06:18:51 jimjam_slam "@brsh~ Twitt~
##  8 29916355            151194957~ 2022-04-07 06:10:44 jimjam_slam "@brsh~ Twitt~
##  9 15772978            151132777~ 2022-04-05 12:59:55 jessicagar~ "@brsh~ Twitt~
## 10 15772978            151117782~ 2022-04-05 03:04:04 jessicagar~ "@brsh~ Twitt~
## # ... with 84 more variables: display_text_width <dbl>,
## #   reply_to_status_id <chr>, reply_to_user_id <chr>,
## #   reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>,
## #   favorite_count <int>, retweet_count <int>, quote_count <int>,
## #   reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>,
## #   urls_t.co <list>, urls_expanded_url <list>, media_url <list>,
## #   media_t.co <list>, media_expanded_url <list>, media_type <list>, ...

  1. Which as of this writing uses the 1.1 API.↩︎

  2. As it takes less code.↩︎

  3. Which is not yet supported by {rtweet}. This is actively being worked on so this post may have a short shelf-life.↩︎

  4. You’ll need to authenticate through a Twitter developer portal app keys if you want to run those sections automatically. You’ll notice that in creating this script I actually don’t evaluate most of the sections and then use some hidden code chunks to return output.↩︎

  5. This seemed to particularly be the case when it came to seeing all quotes and mentions.↩︎

  6. The reason I’m using {httr} and v2 instead of {rtweet} for this is that the 1.1 API (that {rtweet} currently uses) doesn’t pull quote count unless you have a premium or enterprise account rtweet#640.↩︎

  7. Thread here seemed to suggest that just searching the url was the way to go.↩︎

  8. rtweet::get_retweeters() has a lot fewer columns returned compared to that from rtweet::search_tweets(), which is why I use select() above and a different method than the section before and after this where I instead use pull() and then pass the ideas directly to purrr::map*() statements rather than wrapping them in a mutate() verb – which would have worked just as well. The structures of the manipulation are nearly the same… maybe should have stayed consistent here and written a function to make clear the pattern here is the same, c’est la vie.↩︎

To leave a comment for the author, please follow the link and comment on their blog: rstats on Bryan Shalloway's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.