Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Letโs celebrate #WorldEmojiDay with a quick exploration of my own twitter account.
The ?
Weโll need:
From Github
{emo}
remote::install_github("hadley/emo")
From CRAN
{dplyr}
{tidyr}
{rtweet}
{tidytext}
Note: This page has been created at:
Sys.time() ## [1] "2018-07-17 17:22:29 CEST"
The ?
Letโs get my last 3200 tweets:
library(emo) library(rtweet) library(dplyr) ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union res <- get_timeline( "_ColinFay", n = 3200 ) names(res) ## [1] "user_id" "status_id" ## [3] "created_at" "screen_name" ## [5] "text" "source" ## [7] "display_text_width" "reply_to_status_id" ## [9] "reply_to_user_id" "reply_to_screen_name" ## [11] "is_quote" "is_retweet" ## [13] "favorite_count" "retweet_count" ## [15] "hashtags" "symbols" ## [17] "urls_url" "urls_t.co" ## [19] "urls_expanded_url" "media_url" ## [21] "media_t.co" "media_expanded_url" ## [23] "media_type" "ext_media_url" ## [25] "ext_media_t.co" "ext_media_expanded_url" ## [27] "ext_media_type" "mentions_user_id" ## [29] "mentions_screen_name" "lang" ## [31] "quoted_status_id" "quoted_text" ## [33] "quoted_created_at" "quoted_source" ## [35] "quoted_favorite_count" "quoted_retweet_count" ## [37] "quoted_user_id" "quoted_screen_name" ## [39] "quoted_name" "quoted_followers_count" ## [41] "quoted_friends_count" "quoted_statuses_count" ## [43] "quoted_location" "quoted_description" ## [45] "quoted_verified" "retweet_status_id" ## [47] "retweet_text" "retweet_created_at" ## [49] "retweet_source" "retweet_favorite_count" ## [51] "retweet_retweet_count" "retweet_user_id" ## [53] "retweet_screen_name" "retweet_name" ## [55] "retweet_followers_count" "retweet_friends_count" ## [57] "retweet_statuses_count" "retweet_location" ## [59] "retweet_description" "retweet_verified" ## [61] "place_url" "place_name" ## [63] "place_full_name" "place_type" ## [65] "country" "country_code" ## [67] "geo_coords" "coords_coords" ## [69] "bbox_coords" "status_url" ## [71] "name" "location" ## [73] "description" "url" ## [75] "protected" "followers_count" ## [77] "friends_count" "listed_count" ## [79] "statuses_count" "favourites_count" ## [81] "account_created_at" "verified" ## [83] "profile_url" "profile_expanded_url" ## [85] "account_lang" "profile_banner_url" ## [87] "profile_background_url" "profile_image_url"
Here is what the text
column looks like:
res %>% pull(text) %>% .[1:5] ## [1] "@GoldbergData It adds a little label at the top left with the text you provide. \nCan be useful if you want to add some legends in a markdown / shiny app, for example" ## [2] "#RStats \nCool new feature in ggplot2 v3 โ tagging plots : https://t.co/jFUqX2Tj5T" ## [3] "#RStats โ A perfect introduction to \U0001f5fa with the {sf} \U0001f4e6 & Co by @statnmap : \nhttps://t.co/IrmcSBDMDy https://t.co/m3TyUjrxYF" ## [4] "@vsbuffalo Amen to that" ## [5] "#RStats โ \U0001f680 Setting up RStudio Server, Shiny Server and PostgreSQL :\nhttps://t.co/J1Y7edNAj0"
As you can see, the emojis are not printed in the console, but converted
to weird characters like \U0001f4e6
and such. These are unicode
characters: translations of the emojis into a language your machine can
understand. I wonโt go deeper into this, here are two resources you can
read if you want to know more about encoding:
The ?
Letโs use the {emo}
package to extract the emojis from the text.
Inspired by {stringr}
, this package has a ji_extract_all
function
that is designed to extract all the emojis from a character vector.
Weโll use it on out text column, then extract the date and emo column.
We then pass the result to tidyr::unnest
in order to remove the empty
emo rows (i.e, the tweets without an emoji).
library(tidyr) emos <- res %>% mutate( emo = ji_extract_all(text) ) %>% select(created_at,emo) %>% unnest(emo) emos ## # A tibble: 887 x 2 ## created_at emo ## <dttm> <chr> ## 1 2018-07-17 10:00:47 ? ## 2 2018-07-17 08:35:05 ? ## 3 2018-07-16 18:47:25 ? ## 4 2018-07-16 14:51:30 ? ## 5 2018-07-16 14:51:16 ? ## 6 2018-07-16 13:28:08 ? ## 7 2018-07-16 13:27:00 ? ## 8 2018-07-16 13:27:00 ? ## 9 2018-07-16 13:27:00 ? ## 10 2018-07-16 13:25:01 ? ## # ... with 877 more rows emos %>% count(emo, sort = TRUE) ## # A tibble: 187 x 2 ## emo n ## <chr> <int> ## 1 ? 84 ## 2 ? 56 ## 3 ? 51 ## 4 ? 50 ## 5 ? 50 ## 6 ? 42 ## 7 ? 36 ## 8 ? 35 ## 9 ? 33 ## 10 ? 28 ## # ... with 177 more rows
So apparently, I use a lot of ?. But also talk about ?, which sounds more appropriate ๐
As you can see, {tibble}
converts elements to emojis when printing.
When using a data.frame, you have a simple unicode translation:
emos %>% as.data.frame() %>% head() ## created_at emo ## 1 2018-07-17 10:00:47 \U0001f4e6 ## 2 2018-07-17 08:35:05 \U0001f680 ## 3 2018-07-16 18:47:25 \U0001f62e ## 4 2018-07-16 14:51:30 \U0001f601 ## 5 2018-07-16 14:51:16 \U0001f631 ## 6 2018-07-16 13:28:08 \U0001f352
The ?
Letโs flag all the emojis with their names:
emos %>% left_join( data.frame( emo = ji_name, name = names(ji_name) ) ) %>% count(emo, name, sort = TRUE) ## Joining, by = "emo" ## Warning: Column `emo` joining character vector and factor, coercing into ## character vector ## # A tibble: 295 x 3 ## emo name n ## <chr> <fct> <int> ## 1 ? thinking 84 ## 2 ? thinking_face 84 ## 3 ? package 56 ## 4 ? grimacing 51 ## 5 ? grimacing_face 51 ## 6 ? party_popper 50 ## 7 ? tada 50 ## 8 ? face_screaming_in_fear 50 ## 9 ? scream 50 ## 10 ? innocent 42 ## # ... with 285 more rows
The ?
And finally, letโs see what are the most associated words with the emojis we just saw:
library(tidytext) emos_with_id <- res %>% mutate( emo = ji_extract_all(text) ) %>% select(status_id, text, emo) %>% tidyr::unnest(emo) emos_with_id %>% unnest_tokens(word,text) %>% anti_join(stop_words) %>% anti_join(proustr::stop_words) %>% anti_join( data.frame( word = c("https", "t.co", "https", "gt") ) ) %>% count(emo, word, sort = TRUE) ## Joining, by = "word" ## Joining, by = "word" ## Joining, by = "word" ## Warning: Column `word` joining character vector and factor, coercing into ## character vector ## # A tibble: 5,660 x 3 ## emo word n ## <chr> <chr> <int> ## 1 ? rstats 37 ## 2 ? rstats 27 ## 3 ? macbook 26 ## 4 ? package 20 ## 5 ? trans 18 ## 6 โ pm 15 ## 7 ? pro 15 ## 8 ? marche 10 ## 9 ? ma_salmon 10 ## 10 ? ma_salmon 10 ## # ... with 5,650 more rows
And what are the most used emojis with โrstatsโ?
emos_with_id %>% unnest_tokens(word, text) %>% anti_join(stop_words) %>% anti_join(proustr::stop_words) %>% anti_join( data.frame( word = c("https", "t.co", "https", "gt") ) ) %>% count(emo, word, sort = TRUE) %>% filter( word == "rstats" ) ## Joining, by = "word" ## Joining, by = "word" ## Joining, by = "word" ## Warning: Column `word` joining character vector and factor, coercing into ## character vector ## # A tibble: 81 x 3 ## emo word n ## <chr> <chr> <int> ## 1 ? rstats 37 ## 2 ? rstats 27 ## 3 ? rstats 5 ## 4 ? rstats 4 ## 5 ? rstats 4 ## 6 ? rstats 4 ## 7 โ๏ธ rstats 3 ## 8 ? rstats 3 ## 9 ? rstats 3 ## 10 โก rstats 2 ## # ... with 71 more rows
Other cool functions
I recently discovered the ji_glue()
function which allows you to
insert an emoji easily into a character vector :
ji_glue("I love to code :package:") ## I love to code ? ji_glue("Sometimes they make me :scream:") ## Sometimes they make me ? ji_glue("Sometimes they make me :cry:") ## Sometimes they make me ? ji_glue("Sometimes they make me :fear:") ## Sometimes they make me ? ji_glue("But in the end I'm always :tada:") ## But in the end I'm always ?
The ji()
function can also be used inside your markdown, so you can
write:
โI hate backtick r emo::ji(โbugโ) backtickโ, and it will come as: โI hate ?โ.
(of course, replace backtick by actuwith backticks ๐ ).
Thatโs all folks ?
Thatโs all for today! Now have a nice emoji day ?
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.