rtoot: Collecting and Analyzing Mastodon Data
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
It has been a wild view days on Twitter after Elon Musk took over. The future of the platform is unclear and many users are looking for alternatives, a popular one being mastodon. I also decided to give it a try and signed up. I quite quickly became interested in its API and realized that there is only a seemingly unmaintained R package on github. So I decided to write a new one. Fast forward a week(!!!!) and the package rtoot
was accepted by CRAN. In this post I will introduce some of the functionality of the package and a roadmap for the future. (The name of the package derives from “toot”, the equivalent of a “tweet”)
# developer version remotes::install_github("schochastics/rtoot") # CRAN version install.packages("rtoot") library(rtoot)
Authenticate
Before doing anything you should setup credentials. Once setup, you will not need to bother with that anymore (hopefully). There is a vignette in the package (vignette("auth")
) which explains the process. In brief, Mastodon has three types of API calls: anonymous, public, and user based. For anonymous calls you do not need any token. A public token can be obtained without an account and gives a few more API call options. A user based grants access to all endpoints but requires an account.
Running the function auth_setup()
will guide you through a process of setting up a token.
auth_setup()
Instances
In contrast to twitter, mastodon is not a single instance, but a federation of different servers.
You sign up at a specific server (say “mastodon.social”) but can still communicate with others from other servers (say “fosstodon.org”). The existence of different instances makes API calls more complex.
For example, some calls can only be made within your own instance (e.g get_timeline_home()
), others can access all instances but you need to specify the instance as a parameter (e.g. get_timeline_public()
).
A list of active instances can be obtained with get_fedi_instances()
. The results are sorted by number of users.
General information about an instance can be obtained with get_instance_general()
str(get_instance_general(instance = "mastodon.social")) ## List of 16 ## $ uri : chr "mastodon.social" ## $ title : chr "Mastodon" ## $ short_description: chr "The original server operated by the Mastodon gGmbH non-profit" ## $ description : chr "" ## $ email : chr "[email protected]" ## $ version : chr "4.0.0rc1" ## $ urls :List of 1 ## ..$ streaming_api: chr "wss://mastodon.social" ## $ stats :List of 3 ## ..$ user_count : int 831723 ## ..$ status_count: int 41091494 ## ..$ domain_count: int 30169 ## $ thumbnail : chr "https://files.mastodon.social/site_uploads/files/000/000/001/@1x/57c12f441d083cde.png" ## $ languages :List of 1 ## ..$ : chr "en" ## $ registrations : logi FALSE ## $ approval_required: logi FALSE ## $ invites_enabled : logi TRUE ## $ configuration :List of 4 ## ..$ accounts :List of 1 ## .. ..$ max_featured_tags: int 10 ## ..$ statuses :List of 3 ## .. ..$ max_characters : int 500 ## .. ..$ max_media_attachments : int 4 ## .. ..$ characters_reserved_per_url: int 23 ## ..$ media_attachments:List of 6 ## .. ..$ supported_mime_types :List of 28 ## .. .. ..$ : chr "image/jpeg" ## .. .. ..$ : chr "image/png" ## .. .. ..$ : chr "image/gif" ## .. .. ..$ : chr "image/heic" ## .. .. ..$ : chr "image/heif" ## .. .. ..$ : chr "image/webp" ## .. .. ..$ : chr "image/avif" ## .. .. ..$ : chr "video/webm" ## .. .. ..$ : chr "video/mp4" ## .. .. ..$ : chr "video/quicktime" ## .. .. ..$ : chr "video/ogg" ## .. .. ..$ : chr "audio/wave" ## .. .. ..$ : chr "audio/wav" ## .. .. ..$ : chr "audio/x-wav" ## .. .. ..$ : chr "audio/x-pn-wave" ## .. .. ..$ : chr "audio/vnd.wave" ## .. .. ..$ : chr "audio/ogg" ## .. .. ..$ : chr "audio/vorbis" ## .. .. ..$ : chr "audio/mpeg" ## .. .. ..$ : chr "audio/mp3" ## .. .. ..$ : chr "audio/webm" ## .. .. ..$ : chr "audio/flac" ## .. .. ..$ : chr "audio/aac" ## .. .. ..$ : chr "audio/m4a" ## .. .. ..$ : chr "audio/x-m4a" ## .. .. ..$ : chr "audio/mp4" ## .. .. ..$ : chr "audio/3gpp" ## .. .. ..$ : chr "video/x-ms-asf" ## .. ..$ image_size_limit : int 10485760 ## .. ..$ image_matrix_limit : int 16777216 ## .. ..$ video_size_limit : int 41943040 ## .. ..$ video_frame_rate_limit: int 60 ## .. ..$ video_matrix_limit : int 2304000 ## ..$ polls :List of 4 ## .. ..$ max_options : int 4 ## .. ..$ max_characters_per_option: int 50 ## .. ..$ min_expiration : int 300 ## .. ..$ max_expiration : int 2629746 ## $ contact_account :List of 22 ## ..$ id : chr "1" ## ..$ username : chr "Gargron" ## ..$ acct : chr "Gargron" ## ..$ display_name : chr "Eugen 💀" ## ..$ locked : logi FALSE ## ..$ bot : logi FALSE ## ..$ discoverable : logi TRUE ## ..$ group : logi FALSE ## ..$ created_at : chr "2016-03-16T00:00:00.000Z" ## ..$ note : chr "<p>Founder, CEO and lead developer <span class=\"h-card\"><a href=\"https://mastodon.social/@Mastodon\" class=\"| __truncated__ ## ..$ url : chr "https://mastodon.social/@Gargron" ## ..$ avatar : chr "https://files.mastodon.social/accounts/avatars/000/000/001/original/dc4286ceb8fab734.jpg" ## ..$ avatar_static : chr "https://files.mastodon.social/accounts/avatars/000/000/001/original/dc4286ceb8fab734.jpg" ## ..$ header : chr "https://files.mastodon.social/accounts/headers/000/000/001/original/3b91c9965d00888b.jpeg" ## ..$ header_static : chr "https://files.mastodon.social/accounts/headers/000/000/001/original/3b91c9965d00888b.jpeg" ## ..$ followers_count: int 195985 ## ..$ following_count: int 317 ## ..$ statuses_count : int 72663 ## ..$ last_status_at : chr "2022-11-11" ## ..$ noindex : logi FALSE ## ..$ emojis : list() ## ..$ fields :List of 1 ## .. ..$ :List of 3 ## .. .. ..$ name : chr "Patreon" ## .. .. ..$ value : chr "<a href=\"https://www.patreon.com/mastodon\" target=\"_blank\" rel=\"nofollow noopener noreferrer me\"><span cl"| __truncated__ ## .. .. ..$ verified_at: NULL ## $ rules :List of 6 ## ..$ :List of 2 ## .. ..$ id : chr "1" ## .. ..$ text: chr "Sexually explicit or violent media must be marked as sensitive when posting" ## ..$ :List of 2 ## .. ..$ id : chr "2" ## .. ..$ text: chr "No racism, sexism, homophobia, transphobia, xenophobia, or casteism" ## ..$ :List of 2 ## .. ..$ id : chr "3" ## .. ..$ text: chr "No incitement of violence or promotion of violent ideologies" ## ..$ :List of 2 ## .. ..$ id : chr "4" ## .. ..$ text: chr "No harassment, dogpiling or doxxing of other users" ## ..$ :List of 2 ## .. ..$ id : chr "5" ## .. ..$ text: chr "No content illegal in Germany" ## ..$ :List of 2 ## .. ..$ id : chr "7" ## .. ..$ text: chr "Do not share intentionally false or misleading information" ## - attr(*, "headers")= tibble [1 × 3] (S3: tbl_df/tbl/data.frame) ## ..$ rate_limit : chr "300" ## ..$ rate_remaining: chr "299" ## ..$ rate_reset : POSIXlt[1:1], format: "2022-11-11 16:20:00"
get_instance_activity()
shows the activity for the last three months and get_instance_trends()
the trending hashtags of the week.
get_instance_activity(instance = "fosstodon.org") ## # A tibble: 12 × 4 ## week statuses logins registrations ## <dttm> <int> <int> <int> ## 1 2022-11-10 21:47:00 13647 7623 691 ## 2 2022-11-03 21:47:00 23227 11913 3401 ## 3 2022-10-27 21:47:00 0 0 0 ## 4 2022-10-20 21:47:00 0 0 0 ## 5 2022-10-13 21:47:00 0 0 0 ## 6 2022-10-06 21:47:00 0 0 0 ## 7 2022-09-29 21:47:00 0 0 0 ## 8 2022-09-22 21:47:00 0 0 0 ## 9 2022-09-15 21:47:00 0 0 0 ## 10 2022-09-08 21:47:00 0 0 0 ## 11 2022-09-01 21:47:00 0 0 0 ## 12 2022-08-25 21:47:00 0 0 0 get_instance_trends(instance = "fosstodon.org") ## # A tibble: 70 × 5 ## name url day accou…¹ uses ## <chr> <chr> <date> <int> <int> ## 1 followbackfriday https://fosstodon.org/tags/followb… 2022-11-11 175 246 ## 2 followbackfriday https://fosstodon.org/tags/followb… 2022-11-10 3 3 ## 3 followbackfriday https://fosstodon.org/tags/followb… 2022-11-09 2 2 ## 4 followbackfriday https://fosstodon.org/tags/followb… 2022-11-08 1 1 ## 5 followbackfriday https://fosstodon.org/tags/followb… 2022-11-07 0 0 ## 6 followbackfriday https://fosstodon.org/tags/followb… 2022-11-06 0 0 ## 7 followbackfriday https://fosstodon.org/tags/followb… 2022-11-05 0 0 ## 8 followfriday https://fosstodon.org/tags/followf… 2022-11-11 246 352 ## 9 followfriday https://fosstodon.org/tags/followf… 2022-11-10 26 30 ## 10 followfriday https://fosstodon.org/tags/followf… 2022-11-09 12 31 ## # … with 60 more rows, and abbreviated variable name ¹accounts
Get toots
To get the most recent toots of a specific instance use get_timeline_public()
get_timeline_public(instance = "mastodon.social") ## id uri created_at content visib…¹ sensi…² spoil…³ reblo…⁴ favou…⁵ repli…⁶ ## <chr> <chr> <dttm> <chr> <chr> <lgl> <chr> <int> <int> <int> ## 1 10931614… http… 2022-11-09 22:12:13 "<p>Vi… public FALSE "" 0 0 0 ## 2 10931614… http… 2022-11-09 22:04:24 "<p>I … public FALSE "" 0 0 0 ## 3 10931614… http… 2022-11-09 21:46:36 "<p>Ha… public FALSE "" 0 0 0 ## 4 10931614… http… 2022-11-09 22:12:11 "<p>To… public FALSE "" 0 0 0 ## 5 10931614… http… 2022-11-09 22:12:05 "<p>:s… public FALSE "" 0 0 0 ## 6 10931614… http… 2022-11-09 22:12:05 "<p>We… public FALSE "" 0 0 0 ## 7 10931614… http… 2022-11-09 22:12:09 "<p>He… public FALSE "" 0 0 0 ## 8 10931614… http… 2022-11-09 22:12:09 "<p>Et… public FALSE "" 0 0 0 ## 9 10931614… http… 2022-11-09 22:12:08 "<p>Af… public FALSE "" 0 0 0 ## 10 10931614… http… 2022-11-09 22:04:19 "<p>I'… public FALSE "" 0 0 0 ## 11 10931614… http… 2022-11-09 22:12:05 "<p>\"… public FALSE "" 0 0 0 ## 12 10931614… http… 2022-11-09 22:12:06 "<p>Wh… public FALSE "" 0 0 0 ## 13 10931614… http… 2022-11-09 22:12:05 "<p>Ev… public FALSE "" 0 0 0 ## 14 10931614… http… 2022-11-09 22:12:04 "<p>\"… public FALSE "" 0 0 0 ## 15 10931614… http… 2022-11-09 22:12:00 "<p>Wh… public FALSE "" 0 0 0 ## 16 10931614… http… 2022-11-09 22:11:13 "<p>Lo… public FALSE "" 0 0 0 ## 17 10931614… http… 2022-11-09 22:12:04 "<p>Ne… public FALSE "" 0 0 0 ## 18 10931614… http… 2022-11-09 22:12:02 "<p>Th… public FALSE "" 0 0 0 ## 19 10931614… http… 2022-11-09 22:11:50 "<p>So… public FALSE "" 0 0 0 ## 20 10931614… http… 2022-11-09 22:12:01 "<p>Th… public FALSE "" 0 0 0 ## # … with 19 more variables: url <chr>, in_reply_to_id <chr>, in_reply_to_account_id <chr>, ## # language <chr>, text <lgl>, application <I<list>>, poll <I<list>>, card <I<list>>, ## # account <list>, reblog <I<list>>, media_attachments <I<list>>, mentions <I<list>>, ## # tags <I<list>>, emojis <I<list>>, favourited <lgl>, reblogged <lgl>, muted <lgl>, ## # bookmarked <lgl>, pinned <lgl>, and abbreviated variable names ¹visibility, ²sensitive, ## # ³spoiler_text, ⁴reblogs_count, ⁵favourites_count, ⁶replies_count ## # ℹ Use `colnames()` to see all variable names
To get the most recent toots containing a specific hashtag use get_timeline_hashtag()
get_timeline_hashtag(hashtag = "rstats", instance = "fosstodon.org") ## # A tibble: 20 × 29 ## id uri created_at content visib…¹ sensi…² spoil…³ reblo…⁴ ## <chr> <chr> <dttm> <chr> <chr> <lgl> <chr> <int> ## 1 1093260576… http… 2022-11-11 16:12:55 "<p>Re… public FALSE "" 1 ## 2 1093260140… http… 2022-11-11 16:02:20 "<p>Lo… public FALSE "" 0 ## 3 1093260050… http… 2022-11-11 15:59:56 "<p><a… public FALSE "" 0 ## 4 1093259862… http… 2022-11-11 15:56:03 "<p>I … public FALSE "" 3 ## 5 1093259083… http… 2022-11-11 15:35:34 "<p>Pe… public FALSE "" 0 ## 6 1093259018… http… 2022-11-11 15:34:06 "<p>I'… public FALSE "" 1 ## 7 1093258952… http… 2022-11-11 15:32:55 "<p>Wh… public FALSE "" 0 ## 8 1093258902… http… 2022-11-11 15:31:37 "<p>Cu… public FALSE "" 4 ## 9 1093258386… http… 2022-11-11 15:18:31 "<p>Is… public FALSE "" 0 ## 10 1093258337… http… 2022-11-11 15:17:16 "<p><a… public FALSE "" 0 ## 11 1093258243… http… 2022-11-11 15:14:52 "<p>Th… public FALSE "" 4 ## 12 1093258124… http… 2022-11-11 15:11:51 "<p>It… public TRUE "" 0 ## 13 1093257660… http… 2022-11-11 15:00:02 "<p>If… public FALSE "" 1 ## 14 1093257302… http… 2022-11-11 14:50:48 "<p>Cr… public FALSE "" 0 ## 15 1093257130… http… 2022-11-11 14:46:34 "<p>2/… public FALSE "" 4 ## 16 1093257094… http… 2022-11-11 14:45:39 "<p>1/… public FALSE "" 25 ## 17 1093257067… http… 2022-11-11 14:20:41 "<p>Fo… public TRUE "Decis… 0 ## 18 1093256660… http… 2022-11-11 14:34:34 "<p>Tr… public FALSE "" 2 ## 19 1093256557… http… 2022-11-11 14:31:59 "<p>He… public FALSE "" 1 ## 20 1093256340… http… 2022-11-11 14:26:28 "<p>I … public FALSE "" 0 ## # … with 21 more variables: favourites_count <int>, replies_count <int>, ## # url <chr>, in_reply_to_id <chr>, in_reply_to_account_id <chr>, ## # language <chr>, text <lgl>, application <I<list>>, poll <I<list>>, ## # card <I<list>>, account <list>, reblog <I<list>>, ## # media_attachments <I<list>>, mentions <I<list>>, tags <list>, ## # emojis <I<list>>, favourited <lgl>, reblogged <lgl>, muted <lgl>, ## # bookmarked <lgl>, pinned <lgl>, and abbreviated variable names …
The function get_timeline_home()
allows you to get the most recent toots from your own timeline.
get_timeline_home()
Get accounts
rtoot
exposes several account level endpoints. Most require the account id instead of the username as an input. There is, to our knowledge, no straightforward way of obtaining the account id. With the package you can get the id via search_accounts()
.
search_accounts("schochastics") ## # A tibble: 2 × 21 ## id usern…¹ acct displ…² locked bot disco…³ group created_at ## <chr> <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <dttm> ## 1 10930243… schoch… scho… David … FALSE FALSE FALSE FALSE 2022-11-07 00:00:00 ## 2 10926171… schoch… scho… David … FALSE FALSE FALSE FALSE 2022-10-30 00:00:00 ## # … with 12 more variables: note <chr>, url <chr>, avatar <chr>, ## # avatar_static <chr>, header <chr>, header_static <chr>, ## # followers_count <int>, following_count <int>, statuses_count <int>, ## # last_status_at <dttm>, fields <list>, emojis <I<list>>, and abbreviated ## # variable names ¹username, ²display_name, ³discoverable
(Future versions will allow to use the username and user id interchangeably)
Using the id, you can get the followers and following users with get_account_followers()
and
get_account_following()
and statuses with get_account_statuses()
.
id <- "109302436954721982" get_account_followers(id) ## # A tibble: 40 × 21 ## id usern…¹ acct displ…² locked bot disco…³ group created_at ## <chr> <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <dttm> ## 1 1093231… christ… chri… "Chris… FALSE FALSE FALSE FALSE 2022-11-11 00:00:00 ## 2 1093024… psanker psan… "Patri… FALSE FALSE TRUE FALSE 2022-11-07 00:00:00 ## 3 1093161… JLattm… JLat… "Johan… FALSE FALSE TRUE FALSE 2022-11-07 00:00:00 ## 4 1093058… matthi… matt… "Matt … FALSE FALSE TRUE FALSE 2022-10-22 00:00:00 ## 5 1092438… l_biber l_bi… "Loren… FALSE FALSE FALSE FALSE 2022-10-28 00:00:00 ## 6 1092560… gianlu… gian… "Gianl… FALSE FALSE TRUE FALSE 2022-10-28 00:00:00 ## 7 1092876… ReeCee ReeC… "" FALSE FALSE FALSE FALSE 2022-11-04 00:00:00 ## 8 1093136… abitter abit… "André… FALSE FALSE TRUE FALSE 2022-11-07 00:00:00 ## 9 1092763… Andi Andi… "Andi … TRUE FALSE TRUE FALSE 2022-11-02 00:00:00 ## 10 1092657… MattCr… Matt… "Matt … FALSE FALSE TRUE FALSE 2022-11-01 00:00:00 ## # … with 30 more rows, 12 more variables: note <chr>, url <chr>, avatar <chr>, ## # avatar_static <chr>, header <chr>, header_static <chr>, ## # followers_count <int>, following_count <int>, statuses_count <int>, ## # last_status_at <dttm>, fields <I<list>>, emojis <I<list>>, and abbreviated ## # variable names ¹username, ²display_name, ³discoverable get_account_following(id) ## # A tibble: 40 × 21 ## id usern…¹ acct displ…² locked bot disco…³ group created_at ## <chr> <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <dttm> ## 1 1092657… MattCr… Matt… Matt C… FALSE FALSE TRUE FALSE 2022-11-01 00:00:00 ## 2 1092630… ramikr… rami… Rami K… FALSE FALSE FALSE FALSE 2022-10-31 00:00:00 ## 3 1093241… Luk_O Luk_… Lukas … FALSE FALSE FALSE FALSE 2022-11-10 00:00:00 ## 4 1093238… cosima… cosi… Cosima… FALSE FALSE FALSE FALSE 2022-11-11 00:00:00 ## 5 1092094… alexpg… alex… alex h… FALSE FALSE TRUE FALSE 2022-10-21 00:00:00 ## 6 1093183… Johann… Joha… Johann… FALSE FALSE TRUE FALSE 2022-11-09 00:00:00 ## 7 1092535… ropens… rope… rOpenS… FALSE FALSE TRUE FALSE 2022-10-29 00:00:00 ## 8 1093134… crimep… crim… Emma B… FALSE FALSE TRUE FALSE 2022-11-05 00:00:00 ## 9 1093134… gaborc… gabo… Gabor … FALSE FALSE TRUE FALSE 2022-11-09 00:00:00 ## 10 1093111… sachae… sach… Sacha … FALSE FALSE TRUE FALSE 2022-11-08 00:00:00 ## # … with 30 more rows, 12 more variables: note <chr>, url <chr>, avatar <chr>, ## # avatar_static <chr>, header <chr>, header_static <chr>, ## # followers_count <int>, following_count <int>, statuses_count <int>, ## # last_status_at <dttm>, fields <I<list>>, emojis <I<list>>, and abbreviated ## # variable names ¹username, ²display_name, ³discoverable get_account_statuses(id) ## # A tibble: 8 × 29 ## id uri created_at content visib…¹ sensi…² spoil…³ reblo…⁴ ## <chr> <chr> <dttm> <chr> <chr> <lgl> <chr> <int> ## 1 10932547240… http… 2022-11-11 13:45:22 "<p><s… public FALSE "" 1 ## 2 10932521809… http… 2022-11-11 12:40:42 "<p><s… public FALSE "" 0 ## 3 10932424625… http… 2022-11-11 08:33:33 "<p><s… public FALSE "" 0 ## 4 10931062119… http… 2022-11-08 22:48:31 "<p><s… public FALSE "" 0 ## 5 10930365326… http… 2022-11-07 17:16:28 "<p><s… public FALSE "" 0 ## 6 10930261553… http… 2022-11-07 12:52:34 "<p>He… public FALSE "" 0 ## 7 10930256528… http… 2022-11-07 12:39:47 "<p><s… public FALSE "" 0 ## 8 10930253167… http… 2022-11-07 12:31:15 "<p>Hi… public FALSE "" 14 ## # … with 21 more variables: favourites_count <int>, replies_count <int>, ## # url <chr>, in_reply_to_id <chr>, in_reply_to_account_id <chr>, ## # language <chr>, text <lgl>, application <I<list>>, poll <I<list>>, ## # card <I<list>>, account <list>, reblog <I<list>>, ## # media_attachments <I<list>>, mentions <I<list>>, tags <I<list>>, ## # emojis <I<list>>, favourited <lgl>, reblogged <lgl>, muted <lgl>, ## # bookmarked <lgl>, pinned <lgl>, and abbreviated variable names …
Posting statuses
You can post toots with:
post_toot(status = "my first rtoot #rstats")
It can also include media and alt_text.
post_toot(status = "my first rtoot #rstats", media="path/to/media", alt_text = "description of media")
You can mark the toot as sensitive by setting sensitive = TRUE
and add a spoiler text with spoiler_text
.
Pagination
Most functions only return up to 40 results. The current version of rtoot
does not
support pagination out of the box (but it is planned for later). there is a workaround which can be found
in the wiki
Thanks!
This package wouldn’t have been possible without my coauthor @chainsawriot who contributed a huge chunk of code, especially all unit tests! Also thanks to @JBGruber, who contributed to the authentication routines, and @urswilke for some fixes.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.