Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This is a follow-up to a short post I wrote on R Access to Twitter’s v2 API. In this post I’ll walk through a few more examples of pulling data from twitter using a mix of Twitter’s v2 API as well as the {rtweet}
package1.
I’ll pull all Twitter users that I ([@brshallo](https://twitter.com/brshallo)) have recently been engaged by or engaged with. I’ll use a mix of {rtweet}
and {httr}
to collect recent engagements. For each type of engagement I’ll lean towards using {rtweet}
2. I’ll use {httr}
in cases where it’s more convenient to use Twitter’s v2 API3.
In this post I’m not really worried about optimizing my queries, minimizing API hits, etc. E.g. when using {rtweet}
I should authenticate through my project app which has higher rate limits (see Authentication options) but instead I just use the default {rtweet}
user authentication. Note also that the default {rtweet}
authentication only works when running scripts interactively4.
See prior post for links on authentication mechanisms. I’m assuming you have “TWITTER_BEARER” in your .Renviron file (for the sections where I use {httr}
in this post) as well as the default {rtweet}
authentication set-up.
library(rjson) require(httr) require(jsonlite) require(dplyr) library(purrr) library(lubridate) library(rtweet) library(tidyr) # bearer_token only used when using httr and twitter v2 API bearer_token <- Sys.getenv("TWITTER_BEARER") headers <- c(`Authorization` = sprintf('Bearer %s', bearer_token))
GETting all engagements
In each sub-section I’ll pull a different kind of engagement.
- GET favorited users
- GET all tweets from user – starting point for most of the following sections
- From initial query [GET references] in those tweets
- Filter to only tweets with likes, GET favoriters
- Filter to only tweets with quotes, search URL’s to GET quoters
- Filter to only tweets with retweets, GET retweeters
- GET repliers and mentions
I’ll finish by Putting them together into a function. Note that not all queries are perfect at pulling each engagements5.
GET favorited users
It’s often easiest to just let {rtweet}
do the work.
# Twitter id for brshallo user_id <- "307012324" favorites <- rtweet::get_favorites(user = user_id)
GET all tweets from user
Pulls up to 100 of the most recent tweets from a user6.
url_handle <- glue::glue("https://api.twitter.com/2/users/{user_id}/tweets?max_results=100", user_id = user_id) params <- list(tweet.fields = "public_metrics,created_at,in_reply_to_user_id,referenced_tweets") response <- httr::GET(url = url_handle, httr::add_headers(.headers = headers), query = params) obj <- httr::content(response, as = "text") json_data <- jsonlite::fromJSON(obj, flatten = TRUE)$data %>% as_tibble()
GET users references
statuses_referenced <- bind_rows(json_data$referenced_tweets) %>% rename(status_id = id) users_referenced <- rtweet::lookup_tweets(statuses_referenced$status_id)
GET favoriters
Filter initial query of tweets to only those with more than 0 likes.
liked_tweets <- json_data %>% filter(public_metrics.like_count > 0)
Functionalize approach described in getting favoriters from prior post R Access to Twitter’s v2 API and map tweet-ids through.
tweet_ids <- liked_tweets$id get_favoriters <- function(tweet_id){ url_handle <- glue::glue("https://api.twitter.com/2/tweets/{status_id}/liking_users", status_id = tweet_id) response <- httr::GET(url = url_handle, httr::add_headers(.headers = headers)) # query = params) obj <- httr::content(response, as = "text") x <- rjson::fromJSON(obj) x$data %>% map_dfr(as_tibble) } tweet_favoriters <- map_dfr(tweet_ids, ~ bind_cols(tibble(liked_status_id = .x), get_favoriters(.x))) %>% rename(user_id = id)
GET quoters
Filter to only posts with quotes.
tweet_ids_quoters <- json_data %>% filter(public_metrics.quote_count > 0) %>% pull(id)
Again, if you can use {rtweet}
it’s generally easier to do this. However I am not positive the approach below actually picks up all quotes7. I’d also reviewed some other approaches[^other approaches].
search_tweets_urls <- function(tweet_id){ rtweet::search_tweets( glue::glue("url:{tweet_id}", tweet_id = tweet_id) ) } quoters <- map_dfr(tweet_ids_quoters, search_tweets_urls) %>% filter(is_quote) %>% as_tibble() [^other approaches]: This also seems to be way to see quoters: https://twittercommunity.com/t/how-we-can-get-list-of-replies-on-a-tweet-or-reply-to-a-tweet-in-twitter-api/144958/7 ```r get_quoters <- function(tweet_id){ url_handle <- glue::glue("https://api.twitter.com/2/tweets/search/recent?tweet.fields=author_id&query=url:{status_id}", status_id = tweet_id) response <- httr::GET(url = url_handle, httr::add_headers(.headers = headers)) # query = params) obj <- httr::content(response, as = "text") x <- rjson::fromJSON(obj) x$data %>% map_dfr(as_tibble) } quoters <- map(tweet_ids_quoters, get_quoters) ```
GET retweeters
Filter to only posts that were retweeted.
tweet_ids_rt <- json_data %>% filter(public_metrics.retweet_count > 0) %>% select(status_id = id)
I use a slightly different approach in this section than in other similar sections8.
retweeters <- tweet_ids_rt %>% mutate(retweeters = map(status_id, get_retweeters)) %>% unnest(retweeters)
GET repliers and mentions
Alternatively you might just use rtweet::
get_mentions()` but this only pulls mentions of the currently authenticated user. I also tried some other approaches here[^reply-approaches].
get_mentions_v2 <- function(user_id){ url_handle <- glue::glue("https://api.twitter.com/2/users/{user_id}/mentions", user_id = user_id) response <- httr::GET(url = url_handle, httr::add_headers(.headers = headers)) obj <- httr::content(response, as = "text") x <- rjson::fromJSON(obj) x$data %>% map_dfr(as_tibble) } tweets_mentions <- get_mentions_v2(gorthon_id) repliers_mentions <- lookup_tweets(mentions$id) [^reply-approaches]: Another simple approach would be to just try: `rtweet::search_tweets("@brshallo")` . I tried the approach below, but really didnt' seem to work quite as expected... ```r tweet_ids_repliers <- json_data %>% filter(public_metrics.reply_count > 0) %>% pull(id) # pulled from here: https://twittercommunity.com/t/how-to-fetch-retweets-and-quote-tweets-from-the-twitter-v2-search-api/156573 but didn't really work as expected... get_replies <- function(tweet_id){ url_handle <- glue::glue("https://api.twitter.com/2/tweets/search/recent?tweet.fields=author_id&query=conversation_id:{status_id}", status_id = tweet_id) response <- httr::GET(url = url_handle, httr::add_headers(.headers = headers)) # query = params) obj <- httr::content(response, as = "text") x <- rjson::fromJSON(obj) x$data %>% map_dfr(as_tibble) } repliers <- map(tweet_ids_repliers, get_replies) filter(is_quote) repliers <- bind_rows(repliers) ```
Putting them together into a function
The function at this gist returns the output from each of the above sections as a list.
# Twitter id for brshallo user_id <- "307012324" # load function get_engagements() source("https://gist.githubusercontent.com/brshallo/119d6a1f858e0e5c20d77212dee8891a/raw/751d022c7bc2e2148292bb78a5178737d9914024/get-engagements.R") brshallo_engagements <- get_engagements(user_id) brshallo_engagements ## $favorites ## # A tibble: 10 x 91 ## user_id status_id created_at screen_name text source ## * <chr> <chr> <dttm> <chr> <chr> <chr> ## 1 248350998 151302361~ 2022-04-10 05:18:34 BuildABarr "Drop ~ Twitt~ ## 2 368551889 151263551~ 2022-04-09 03:36:23 IsabellaGh~ "@elli~ Twitt~ ## 3 1469531055736590337 151242047~ 2022-04-08 13:21:54 emkayco "Have ~ Twitt~ ## 4 35794978 151196918~ 2022-04-07 07:28:38 _lionelhen~ "@brsh~ Twitt~ ## 5 29916355 151194957~ 2022-04-07 06:10:44 jimjam_slam "@brsh~ Twitt~ ## 6 29916355 151195192~ 2022-04-07 06:20:03 jimjam_slam "@brsh~ Twitt~ ## 7 29916355 151189984~ 2022-04-07 02:53:06 jimjam_slam "@mdne~ Twitt~ ## 8 3089027769 151189179~ 2022-04-07 02:21:09 gyp_casino "@mdne~ Twitt~ ## 9 15772978 151132777~ 2022-04-05 12:59:55 jessicagar~ "@brsh~ Twitt~ ## 10 144592995 151129000~ 2022-04-05 10:29:49 Rbloggers "R Acc~ r-blo~ ## # ... with 85 more variables: display_text_width <dbl>, ## # reply_to_status_id <chr>, reply_to_user_id <chr>, ## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>, ## # favorite_count <int>, retweet_count <int>, quote_count <int>, ## # reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>, ## # urls_t.co <list>, urls_expanded_url <list>, media_url <list>, ## # media_t.co <list>, media_expanded_url <list>, media_type <list>, ... ## ## $favoriters ## # A tibble: 90 x 4 ## liked_status_id user_id name username ## <chr> <chr> <chr> <chr> ## 1 1512295676004093955 117241741 Brett J. Gall brettjgall ## 2 1512295676004093955 2724597409 Peter Ellis ellis2013nz ## 3 1512294950905409543 274123666 Kristen Downs KristenDDowns ## 4 1512293864517750790 3656879234 <U+5F20><U+4EAE> psychelzh ## 5 1512293864517750790 703843771419484160 Ayush Patel ayushbipinpatel ## 6 1512293864517750790 419185498 Kevin Gilds Kevin_Gilds ## 7 1512293864517750790 127357236 Juan LB Juan_FLB ## 8 1512293864517750790 49451947 Luis Remiro LuisMRemiro ## 9 1512293864517750790 253175044 Nicholas Viau nicholasviau ## 10 1512293864517750790 2202983986 Stefania Klayn Ettti_20 ## # ... with 80 more rows ## ## $references ## # A tibble: 12 x 90 ## user_id status_id created_at screen_name text source ## <chr> <chr> <dttm> <chr> <chr> <chr> ## 1 307012324 151115943~ 2022-04-05 01:50:59 brshallo "As an~ Twitt~ ## 2 307012324 151229344~ 2022-04-08 04:57:09 brshallo "@mdne~ Twitt~ ## 3 307012324 150969487~ 2022-04-01 00:51:20 brshallo "It al~ Twitt~ ## 4 307012324 151229386~ 2022-04-08 04:58:49 brshallo "@mdne~ Twitt~ ## 5 307012324 147233714~ 2021-12-18 22:45:04 brshallo "First~ Twitt~ ## 6 29916355 151189984~ 2022-04-07 02:53:06 jimjam_slam "@mdne~ Twitt~ ## 7 29916355 151194957~ 2022-04-07 06:10:44 jimjam_slam "@brsh~ Twitt~ ## 8 144592995 151129000~ 2022-04-05 10:29:49 Rbloggers "R Acc~ r-blo~ ## 9 248350998 151302361~ 2022-04-10 05:18:34 BuildABarr "Drop ~ Twitt~ ## 10 3146735425 151226195~ 2022-04-08 02:52:00 mdneuzerling "Lovel~ Twitt~ ## 11 983470194982088704 151182189~ 2022-04-06 21:43:22 R4DScommuni~ "The n~ Zapie~ ## 12 2724597409 151226515~ 2022-04-08 03:04:44 ellis2013nz "@mdne~ Twitt~ ## # ... with 84 more variables: display_text_width <dbl>, ## # reply_to_status_id <chr>, reply_to_user_id <chr>, ## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>, ## # favorite_count <int>, retweet_count <int>, quote_count <int>, ## # reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>, ## # urls_t.co <list>, urls_expanded_url <list>, media_url <list>, ## # media_t.co <list>, media_expanded_url <list>, media_type <list>, ... ## ## $quoters ## NULL ## ## $retweeters ## # A tibble: 11 x 2 ## status_id user_id ## <chr> <chr> ## 1 1512293864517750790 296222670 ## 2 1512293864517750790 307012324 ## 3 1511869112401596423 4034079677 ## 4 1511869112401596423 1306626901432324097 ## 5 1511869112401596423 1011817655957893120 ## 6 1511469730892156928 1011817655957893120 ## 7 1511469730892156928 1306626901432324097 ## 8 1511159434717761539 1448348827979747333 ## 9 1511159434717761539 15772978 ## 10 1511159434717761539 1011817655957893120 ## 11 1511159434717761539 1306626901432324097 ## ## $referencers ## # A tibble: 10 x 90 ## user_id status_id created_at screen_name text source ## <chr> <chr> <dttm> <chr> <chr> <chr> ## 1 61542689 150992063~ 2022-04-01 15:48:26 twelvespot "@brsh~ Twitt~ ## 2 61542689 150994022~ 2022-04-01 17:06:17 twelvespot "@brsh~ Twitt~ ## 3 18433005 151007180~ 2022-04-02 01:49:09 rcrdleitao "@brsh~ Twitt~ ## 4 35794978 151196918~ 2022-04-07 07:28:38 _lionelhen~ "@brsh~ Twitt~ ## 5 1346474633520824320 150985661~ 2022-04-01 11:34:03 markjrieke "@brsh~ Twitt~ ## 6 29916355 151195192~ 2022-04-07 06:20:03 jimjam_slam "@brsh~ Twitt~ ## 7 29916355 151195162~ 2022-04-07 06:18:51 jimjam_slam "@brsh~ Twitt~ ## 8 29916355 151194957~ 2022-04-07 06:10:44 jimjam_slam "@brsh~ Twitt~ ## 9 15772978 151132777~ 2022-04-05 12:59:55 jessicagar~ "@brsh~ Twitt~ ## 10 15772978 151117782~ 2022-04-05 03:04:04 jessicagar~ "@brsh~ Twitt~ ## # ... with 84 more variables: display_text_width <dbl>, ## # reply_to_status_id <chr>, reply_to_user_id <chr>, ## # reply_to_screen_name <chr>, is_quote <lgl>, is_retweet <lgl>, ## # favorite_count <int>, retweet_count <int>, quote_count <int>, ## # reply_count <int>, hashtags <list>, symbols <list>, urls_url <list>, ## # urls_t.co <list>, urls_expanded_url <list>, media_url <list>, ## # media_t.co <list>, media_expanded_url <list>, media_type <list>, ...
Which as of this writing uses the 1.1 API.↩︎
As it takes less code.↩︎
Which is not yet supported by
{rtweet}
. This is actively being worked on so this post may have a short shelf-life.↩︎You’ll need to authenticate through a Twitter developer portal app keys if you want to run those sections automatically. You’ll notice that in creating this script I actually don’t evaluate most of the sections and then use some hidden code chunks to return output.↩︎
This seemed to particularly be the case when it came to seeing all quotes and mentions.↩︎
The reason I’m using {httr} and v2 instead of
{rtweet}
for this is that the 1.1 API (that{rtweet}
currently uses) doesn’t pull quote count unless you have a premium or enterprise account rtweet#640.↩︎Thread here seemed to suggest that just searching the url was the way to go.↩︎
rtweet::get_retweeters()
has a lot fewer columns returned compared to that fromrtweet::search_tweets()
, which is why I useselect()
above and a different method than the section before and after this where I instead usepull()
and then pass the ideas directly topurrr::map*()
statements rather than wrapping them in amutate()
verb – which would have worked just as well. The structures of the manipulation are nearly the same… maybe should have stayed consistent here and written a function to make clear the pattern here is the same, c’est la vie.↩︎
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.