Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’m sure many of my fellow Mexicans will remember the historically ill-advised (to say the least) decision by our President to invite Donald Trump for a meeting.
Talking to some fellow colleagues, we couldn’t help but notice that maybe in another era this decision would have been good policy. The problem, some concluded, was the influence of social media today. In fact, the Trump debacle did cause outcry among leading politica voices online.
I wanted to investigate this further, and thankfully for me, I’ve been using R to collect tweets from a catalog of leading political personalities in Mexico for a personal business project.
Here is a short descriptive look at what the 65 twitter accounts I’m following tweeted between August 27th and September 5th (the Donald announced his visit on August the 30th). I’m sorry I can’t share the dataset, but you get the idea with the code…
library(dplyr) library(stringr) # 42 of the 65 accounts tweeted between those dates. d %>% summarise("n" = n_distinct(NOMBRE)) # n # 42
We can see how mentions of trump spike just about the time it was announced…
byhour <- d %>% mutate("MONTH" = as.numeric(month(T_CREATED)), "DAY" = as.numeric(day(T_CREATED)), "HOUR" = as.numeric(hour(T_CREATED)), "TRUMP_MENTION" = str_count(TXT, pattern = "Trump|TRUMP|trump")) %>% group_by(MONTH, DAY, HOUR) %>% summarise("N" = n(), "TRUMP_MENTIONS" = sum(TRUMP_MENTION)) %>% mutate("PCT_MENTIONS" = TRUMP_MENTIONS/N*100) %>% arrange(desc(MONTH), desc(DAY), HOUR) %>% mutate("CHART_DATE" = as.POSIXct(paste0("2016-",MONTH,"-",DAY," ", HOUR, ":00"))) library(ggplot2) library(eem) ggplot(byhour, aes(x = CHART_DATE, y = PCT_MENTIONS)) + geom_line(colour=eem_colors[1]) + theme_eem()+ labs(x = "Time", y = "Trump mentions \n (% of Tweets)")
The peak of mentions (as a percentage of tweets) was September 1st at 6 am (75%). But it terms of amount of tweets, it is much more obvious the outcry was following the anouncement and later visit of the candidate:
ggplot(byhour, aes(x = CHART_DATE, y = TRUMP_MENTIONS)) + geom_line(colour=eem_colors[1]) + theme_eem()+ labs(x = "Time", y = "Trump mentions \n (# of Tweets)")
We can also (sort-of) identify the effect of these influencers tweeting. I’m going to add the followers, which are potential viewers, of each tweet mentioning Trump, by hour.
byaudience <- d %>% mutate("MONTH" = as.numeric(month(T_CREATED)), "DAY" = as.numeric(day(T_CREATED)), "HOUR" = as.numeric(hour(T_CREATED)), "TRUMP_MENTION" = str_count(TXT, pattern = "Trump|TRUMP|trump")) %>% filter(TRUMP_MENTION > 0) %>% group_by(MONTH, DAY, HOUR) %>% summarise("TWEETS" = n(), "AUDIENCE" = sum(U_FOLLOWERS)) %>% arrange(desc(MONTH), desc(DAY), HOUR) %>% mutate("CHART_DATE" = as.POSIXct(paste0("2016-",MONTH,"-",DAY," ", HOUR, ":00"))) ggplot(byaudience, aes(x = CHART_DATE, y = AUDIENCE)) + geom_line(colour=eem_colors[1]) + theme_eem()+ labs(x = "Time", y = "Potential audience \n (# of followers)")
So clearly, I’m stating the obvious. People were talking. But how was the conversation being developed? Let’s first see the type of tweets (RT’s vs drafted individually):
bytype <- d %>% mutate("TRUMP_MENTION" = str_count(TXT, pattern = "Trump|TRUMP|trump")) %>% # only the tweets that mention trump filter(TRUMP_MENTION>0) %>% group_by(T_ISRT) %>% summarise("count" = n()) kable(bytype)
T_ISRT | count |
---|---|
FALSE | 313 |
TRUE | 164 |
About 1 in 3 was a RT. Comparing to the overall tweets, (1389 out of 3833) this seems not too much of a difference, so it wasn’t necesarrily an influencer pushing the discourse. In terms of the most mentioned by tweet it was our President on the spotlight:
bymentionchain <- d %>% mutate("TRUMP_MENTION" = str_count(TXT, pattern = "Trump|TRUMP|trump")) %>% # only the tweets that mention trump group_by(TRUMP_MENTION, MENTION_CHAIN) %>% summarise("count" = n()) %>% ungroup() %>% mutate("GROUPED_CHAIN" = ifelse(grepl(pattern = "EPN", x = MENTION_CHAIN), "EPN", MENTION_CHAIN)) %>% mutate("GROUPED_CHAIN" = ifelse(grepl(pattern = "realDonaldTrump", x = MENTION_CHAIN), "realDonaldTrump", GROUPED_CHAIN)) ggplot(order_axis(bymentionchain %>% filter(count>10 & GROUPED_CHAIN!="ND"), axis = GROUPED_CHAIN, column = count), aes(x = GROUPED_CHAIN_o, y = count)) + geom_bar(stat = "identity") + theme_eem() + labs(x = "Mention chain \n (separated by _|.|_ )", y = "Tweets")
How about the actual persons who tweeted? It seemed like news anchor Joaquin Lopez-Doriga and security analyst Alejandro Hope were the most vocal about the visit (out of the influencers i’m following).
bytweetstar <- d %>% mutate("TRUMP_MENTION" = ifelse(str_count(TXT, pattern = "Trump|TRUMP|trump")<1,0,1)) %>% group_by(TRUMP_MENTION, NOMBRE) %>% summarise("count" = n_distinct(TXT)) ## plot with ggplot2
I also grouped each person by his political affiliation and I found it confirms the notion that the conversation on the eve of the visit, at least among this very small subset of twitter accounts, was driven by those with no party afiliation or in the “PAN” (opposition party).
byafiliation <- d %>% mutate("MONTH" = as.numeric(month(T_CREATED)), "DAY" = as.numeric(day(T_CREATED)), "HOUR" = as.numeric(hour(T_CREATED)), "TRUMP_MENTION" = ifelse(str_count(TXT, pattern = "Trump|TRUMP|trump")>0,1,0)) %>% group_by(MONTH, DAY, HOUR, TRUMP_MENTION, AFILIACION) %>% summarise("TWEETS" = n()) %>% arrange(desc(MONTH), desc(DAY), HOUR) %>% mutate("CHART_DATE" = as.POSIXct(paste0("2016-",MONTH,"-",DAY," ", HOUR, ":00"))) ggplot(byafiliation, aes(x = CHART_DATE, y = TWEETS, group = AFILIACION, fill = AFILIACION)) + geom_bar(stat = "identity") + theme_eem() + scale_fill_eem(20) + facet_grid(TRUMP_MENTION ~.) + labs(x = "Time", y = "Tweets \n (By mention of Trump)")
However, It’s interesting to note how there is a small spike of the accounts afiliated with the PRI (party in power) on the day after his visit (Sept. 1st). Maybe they were trying to drive the conversation to another place?
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.