a global pandemic on Twitter
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Brief introduction
So, lots of linguistic variation happening in real time. coronavirus
, covid19
, pandemic
, and more recently (the
?) coronavirus pandemic
. For sure, these expressions are not proper synonyms – each refer to different “aspects” of the virus. coronavirus
~ virus. covid19
~ disease. pandemic
~ social/epi. Here, we take a super quick look at how this variation in reference is materializing on Twitter among the 535 voting members of the United States Congress since January 2020.
Twitter details
First things first, we obtain Twitter handles and some relevant biographical details (here, political affiliation) for the 100 US Senators and the 435 members of the House of Representatives from the unitedstates project.
library(tidyverse) leg_dets <- 'https://theunitedstates.io/congress-legislators/legislators-current.csv' twitters <- read.csv((url(leg_dets)), stringsAsFactors = FALSE) %>% #filter(type == 'rep') %>% # & twitter!='' rename (state_abbrev = state, district_code = district)
Then we scrape the last 1000 tweets for each of the 535 members of congress using the rtweet
package. Here, we are just trying to get all tweets from 2020 – 1,000 is overkill. We exclude re-tweets. The scraping process takes roughly an hour or so.
congress_tweets <- rtweet::get_timeline( twitters$twitter, n = 1000, check = FALSE) %>% mutate(created_at = as.Date(gsub(' .*$', '', created_at))) %>% filter(is_quote == 'FALSE' & is_retweet == 'FALSE' & created_at >= '2020-01-01' & display_text_width > 0) # setwd("/home/jtimm/jt_work/GitHub/data_sets") # saveRDS(congress_tweets, 'cong2020_tweets_tif.rds')
Then we join the two data sets. And calculate total tweets generated by members of Congress by party affiliation in 2020.
congress_tweets1 <- congress_tweets %>% mutate(twitter = toupper(screen_name)) %>% select(status_id, created_at, twitter, text) %>% inner_join(twitters %>% mutate(twitter = toupper(twitter))) all_tweets <- congress_tweets1 %>% group_by(created_at, party) %>% summarise(ts = n()) %>% rename(date = created_at)
The figure below summarizes total tweets by party affiliation since the first of the year. Donald Trump presented his State of the Union address on February 5th, hence the spike in activity. There seems to be a slight upward trend in total tweets – perhaps one that is more prevalent among Democrats – presumably in response to the Coronavirus.
Also, Democrats do tweet more, but they also have numbers at present. And it seems that members of Congress put their phones down a bit on the weekends.
all_tweets %>% filter(party != 'Independent') %>% # Justin Amash & Bernie Sanders & Angus King ggplot() + geom_line(aes(x = date, y= ts, color = party ), size = 1.25) + theme_minimal() + ggthemes::scale_color_stata() + theme(axis.text.x = element_text(angle = 90, hjust = 1))+ scale_x_date(date_breaks = '1 week', date_labels = "%b %d") + theme(legend.position = 'bottom') + labs(title = 'Total congressional tweets by party affiliation')
Patterns of variation over time
The table below details the first attestation of each referring expression in our 2020 Congressional Twitter corpus. coronavirus
hit the scene on 1-17, followed by pandemic
on 1-22, coronavirus pandemic
on 2-11, and covid19
on 2-12 – the name for the disease coined by the World Health Organization on 2-11.
covid_tweets %>% group_by(covid_gram) %>% filter(date == min(date)) %>% arrange(date) %>% select(covid_gram, date, twitter) %>% knitr::kable()
covid_gram | date | |
---|---|---|
coronavirus | 2020-01-17 | SENFEINSTEIN |
pandemic | 2020-01-22 | MICHAELCBURGESS |
pandemic | 2020-01-22 | SENTOMCOTTON |
coronavirus pandemic | 2020-02-11 | SENATORHASSAN |
covid19 | 2020-02-12 | REPELIOTENGEL |
Probability distributions
Lastly, we consider a proportional perspective on reference to 2019 NOVEL CORONAVIRUS
. Instead of total tweets, the denominator here becomes overall references to 2019 NOVEL CORONAVIRUS
on Twitter among members of Congress.
The figure below, then, illustrates daily probability distributions for forms used to reference 2019 NOVEL CORONAVIRUS
. covid19
has slowly become the majority form on Twitter – coronavirus
has become less and less prevalent. One explanation is that the effects of the virus in the US, ie, the disease, have become more prevalent and, hence, the proper use of the referring expression covid19
. Another explanation is that covid19
is shorter orthographically, and in the character-counting world of Twitter, a more efficient way to express the notion 2019 NOVEL CORONAVIRUS
. An empirical question for sure.
x1 <- covid_tweets %>% filter(date > '2020-2-25') %>% group_by(date, covid_gram) %>% #,party, summarize(n = n()) %>% mutate(per = n/sum(n)) x2 <- x1 %>% ggplot(aes(x=date, y=per, fill = covid_gram))+ geom_bar(alpha = 0.65, stat = 'identity', width = .9) + # theme_minimal() + theme(axis.text.x = element_text(angle = 90, hjust = 1))+ theme(legend.position = "none")+ ggthemes::scale_fill_economist() + scale_x_date(date_breaks = '1 day', date_labels = "%b %d") + labs(title = 'Referring to 2019 NOVEL CORONAVIRUS', subtitle = 'Among US Senators & House Representatives') x2 + annotate(geom="text", x = c(rep(as.Date('2020-3-22'), 4)), y = c(.05, .35, .6, .8), label = c('pandemic', 'covid19', 'coronavirus pandemic', 'coronavirus'), size = 4, color = 'black')
Summary
So, a weekend & social distancing. Caveats galore, but for folks interested in language change & innovation & the establishment of convention in a community of speakers, something to keep an eye on.
Perhaps more interesting is how regular folks are referencing 2019 NOVEL CORONAVIRUS
on Twitter. Everyone stay home & healthy.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.