#TwitterBan and speaking into existence: Evaluating migration intentions in Nigeria
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
Mid-2021, the Federal Government of Nigeria suspended Twitter activities in the country after the micro-blogging platform removed a tweet in which the president threatened to treat members of a secessionist group in “the language they understand”.
Subsequently, several internet service providers in the country blocked access to the microblogging platform, and netizens could only access Twitter using a virtual private network (VPN) that encrypts their internet traffic and enables users to send/receive data across shared or public networks as if they were directly connected to a private network.
Many VPNs allow users to connect to any of their servers in a limited number of countries, including the United Kingdom, United States, Canada, and others. This implies that Twitter users in Nigeria can access Twitter “from the UK, US or Canada”.
Where are you tweeting from?#TwitterBan#TwitterbaninNigeria
— Punch Newspapers (@MobilePunch) June 5, 2021
Press Release: Nigerians are back to Twitter tweeting from different countries. pic.twitter.com/5jEy40Fmir
— NewsTrivia.com (@newstrivia_) June 5, 2021
Soon, Nigerians started disclosing where they were tweeting from in response to an initial tweet. While this just meant the country server they were connected to, many saw it as an opportunity to speak into existence, their desired country of residence. For others, it was solidarity with the government’s position on the ban.
Tweeting from London into existence. https://t.co/mSiIf86xwc
— Mykee (@Mykee_9) June 5, 2021
Multiple studies have explored the potential use of digital traces as viable compliments to traditional data sources. Here, I analysed tweets about the location/country from which netizens were tweeting and compared the observed trends with data from the United Nations (UN) migration stock data. The UN migration stock data includes estimates of in-and-out migration by the country of origin and the country of destination.
Libraries & formatting
I utilised multiple libraries in this project, and the use of each is explained in the code chunk below. To reproduce this analysis, you may need to first install the packages using the ‘install.packages (“packageName”)’ if they are not already installed.
library (tidyverse) # Data wrangling:: # dplyr::select(), filter(), mutate, arrange(), # dplyr::ungroup(), arrange(), left_join() # stringr::str_replace_all(), str_remove(), # tidyr::pivot_longer(), pivot_longer() library (readxl) # Read excel files library (lubridate) # Manipulate and parse dates library (rtweet) # Collect and analyze Twitter Data library (tidytext) # Tidy text-data analysis library (showtext) # Importing fonts:: showtext_auto() library (ggpubr) # Merging graphs library (gganimate) # for animating images:: transition_manual(), animate(), library (gifski)
I also love to use custom fonts like BarlowCondensed and AlegreyaSans to style the graphs. The fonts can be imported and installed in R using the showtext
These fonts and others are freely available online, but I have also uploaded them on my GitHub repository. You will need to download and install each of the fonts manually before attempting to load them in R.
In the lines of code below, each font is nested in an if-else statement so that an alternative font ARIALN.ttf or Candara.ttf is used if R is unable to find the custom font on your local machine. You may also need to edit/adjust the command to point to the relevant directory if you are a macOS user.
showtext_auto() if ("BarlowCondensed-Light.ttf" %in% list.files("C:\\Windows\\Fonts")) { font_add("BarlowCondensed-Light", "BarlowCondensed-Light.ttf") axis_text <- "BarlowCondensed-Light" } else { font_add("ARIALN", "ARIALN.ttf") axis_text <- "ARIALN" } ## Caption Text if ("BarlowCondensed-Medium.ttf" %in% list.files("C:\\Windows\\Fonts")) { font_add("BarlowCondensed-Medium", "BarlowCondensed-Medium.ttf") title_text <- "BarlowCondensed-Medium" } else { font_add("ARIALNI", "ARIALNI.ttf") title_text <- "ARIALNI" } ## Graphics Title if ("AlegreyaSans-MediumItalic.ttf" %in% list.files("C:\\Windows\\Fonts")) { font_add("AlegreyaSans", "AlegreyaSans-Italic.ttf") caption_text <- "AlegreyaSans" } else { font_add("Candara", "Candara.ttf") caption_text <- "Candara" }
Data sources & retrieval
I retrieved tweets that included the keyword “tweeting from” directly from Twitter on June 7 using the script below via Twitter API. The search_tweets()
function from the rtweet package retrieves tweets that match the submitted query and are posted within the last 6–9 days.
tweet_NG <- search_tweets(q = "\"tweeting from\"", n = 50000, geocode = "-2.791166,12.094754,2258mi", include_rts = FALSE, type = "recent", retryonratelimit = TRUE)
Given the day-time limit of the function, it may be impossible to retrieve exactly the same tweets analysed here. As a result, I have saved the pseudonymised version of the retrieved data on Github . The data can be imported into R using the code chunk below.
tweet_NG <- read.csv("https://raw.githubusercontent.com/eolamijuwon/datasets/main/MigTwitter.csv")
I also retrieved data on migration stock from the United Nations website to further compare the patterns observed in the tweets. The data can be retrieved and imported into R directly using the
download.file()
function and assigned to an R object using the read_xlsx
function from the readxl package.
download.file(url = "https://www.un.org/en/development/desa/population/migration/data/estimates2/data/UN_MigrantStockByOriginAndDestination_2019.xlsx", method = "curl", destfile = "UN Migration.xlsx") migration_UN <- read_xlsx("UN Migration.xlsx", skip = 15, sheet = 2)
Data wrangling – Twitter
I leveraged multiple approaches in cleaning the data.
-
I restricted the data to tweets posted between June 5 and June 7 because this was the period that most people posted about tweeting from a location/country using a VPN.
-
Each tweet is a combination of multiple words. As a result, I sub-divided the tweets into trigrams (3-word phrases) and created different columns for each word in the trigram.
-
Furthermore, I replaced words unrelated to the analysis with NA. This includes words that include numbers as this were unlikely to be the name of a country or a location. I also excluded other words such as HTTPS/HTTPS, opera, keepiton, etc.
-
I also removed stop words from the data. Stop words are commonly used words (such as the, are, is etc.) that carry very little useful information.
updt_tweet <- tweet_NG %>% filter (created_at >= as.Date("2021-06-05") & created_at <= as.Date("2021-06-07")) %>% select (status_id, created_at, text, screen_name) %>% unnest_tokens(bigram, text, token = "ngrams", n = 3) %>% separate(bigram, c("word1", "word2", "word3"), sep = " ") %>% filter (word1 == "from") %>% mutate (word2 = replace(word2, which(str_detect(word2, pattern = "[[:digit:]]+[[:digit:]]+") | str_detect(word2, pattern = "(.)\\1{3,}") | str_detect(word2, pattern = "\\b(.)\\b")), NA), word3 = replace(word3, which(str_detect(word3, pattern = "[[:digit:]]+[[:digit:]]+") | str_detect(word3, pattern = "(.)\\1{3,}") | str_detect(word3, pattern = "\\b(.)\\b") ## removes any remaining single letter words ), NA)) %>% mutate (word2 = replace(word2, which(word2 %in% stop_words$word), NA), word3 = replace(word3, which(word3 %in% stop_words$word), NA)) %>% mutate (word2 = replace(word2, which(str_detect(word2, pattern = "https|opera|abeg|abi|keepiton|zoo|accessing|biko|jack|lol|country|countries") | str_detect(word2, pattern = "twitterban|echoke|guy|vpn|twitter|tweeting|tundeeddnut") | str_detect(word2, pattern = "trolls_queen|thankgodforvpn|thunder|timothymutuake|sm") | str_detect(word2, pattern = "someplace|sophie|sos|sir|sire|9mobilengcare|9mobileng|3rd")), NA), word3 = replace(word3, which(str_detect(word3, pattern = "https|opera|abeg|abi|keepiton|zoo|accessing|biko|jack|lol|country|countries") | str_detect(word3, pattern = "twitterban|echoke|guy|vpn|twitter|tweeting|tundeeddnut") | str_detect(word3, pattern = "trolls_queen|thankgodforvpn|thunder|timothymutuake|sm") | str_detect(word3, pattern = "someplace|sophie|sos|sir|sire|9mobilengcare|9mobileng|3rd")), NA)) %>% mutate (word3 = replace(word3, which(!is.na(word2) & str_detect(word3, "nigeria")), NA)) %>% filter (!is.na(word2) | !is.na(word3)) #!word2 %in% stop_words$word)
I merged word2 and word3 columns for instances where both columns are not missing to populate the newly created country column for countries with two words (the United Kingdom) and either of word2 or word3 when one of them has a missing value.
tweet_GEO <- updt_tweet %>% mutate (country = ifelse((!is.na(word2) & !is.na(word3)), paste0(word2, " ", word3), coalesce (word2, word3))) %>% mutate (country = str_to_sentence(country))
I observed that some users reported city names (such as London) as their tweeting location during data exploration. Several locations posted were also incorrectly spelt (such as Landon, Nethaland, etc.). As a result, I created a comprehensive list of countries their popular cities and re-group all locations to the national level.
clean_tweet <- tweet_GEO %>% mutate (country = replace(country, which(str_detect(country, "Alabama|Alaska|Arizona|Arkansas|California|Colorado|Connecticut|Delaware|Florida|Georgia|Hawaii|Idaho|Illinois|Indiana|Iowa|Kansas|Kentucky|Louisiana|Maine|Maryland|Massachusetts|Michigan|Minnesota|Mississippi|Missouri|Montana|Nebraska|Nevada|Hampshire|Jersey|New Mexico|York|Unitedstates|Vegas|Carolina|Dakota|Ohio|Oklahoma|San|Oregon|Pennsylvania|Rhode Island|South Carolina|South Dakota|Tennessee|Texas|Utah|Vermont|Virginia|Washington|West Virginia|Wisconsin|Wyoming|Seattle|Miami|Boston|Chicago|Nyc|Los|Las|Houston|Losangeles|Atlanta|America|Dellas|Dallas|Usa|Brooklyn|usa|carolina|mississippiTwittingfromatlanta|florida|Minneapolis|Newyork|Manhattan|Maimi")), "USA")) %>% mutate (country = replace(country, which(str_detect(country, "Amsterdam|Netherland|Nederland|Nethaland|Neitherland|Nertherland|Meppel")), "Netherlands")) %>% mutate (country = replace(country, which(str_detect(country, "Buckingham|London|Manchester|Custard|England|Uk|Southampton|Unitedkingdom|Landon|Glasgow|Scotland|Northampton|Birmingham|Westminster|United kingdom|Britain|london|United kingdon|United kindom")), "United Kingdom")) %>% mutate (country = replace(country, which(str_detect(country, "Uar|Abj|Abk|Aba|Asaba |Arewa|Ibadan|Aso|Asorock|Abeokuta|Abuja|Adamawa|Nigeria|Ikorodu|Ado|Agege|Akoka|Akure|Lagos|lagos|Egbeda|Ikeja|Akurehowfar|Akwaibomtwitter|Naija|Portharcourt|Osun|Osogbo|Ogun|Yenagoa|Ibarapa|Calabar|Ife|Edo|Ebonyi|Eastern|Fmicnigeria|Oye|Katsina|Niegria|Nigeriagov|Lokoja|Lafiaji|Sokoto|Nigerians|Oduduwa|Uyo|9ja|Shomolu|Unitedafricanrepublic|United african|Zaria|Stateofosun|Sagamu|Mowe ibafo|Ogbmosho|Ogbomosho|Ogbomoso|Oke ode|Olaiya strt|Oluyole town|Ondo|Onitsha anambra|Ota|Makurdi|Lekki|Kaduna|Kano|Jos|Ekiti|Enugu|Akwa ibom|United africa|Port harcourt|Maiduguri|United aafrican|United adamugaba|United arewa|United ede|United epe")), "Nigeria")) %>% mutate (country = replace(country, which(str_detect(country, "Alberta|British Columbia|Manitoba|ontario|Montreal|Ontario|Quebec|Saskatchewan|Toronto|Vancouver|Canada|canada")), "Canada")) %>% mutate (country = replace(country, which(str_detect(country, "Paris|paris|France")), "France")) %>% mutate (country = replace(country, which(str_detect(country, "Instabul|Istanbul")), "Turkey"), country = replace(country, which(str_detect(country, "Germany|germany|Frankfurt|9jagermany|Berlin|Stutggart")), "Germany"), country = replace(country, which(str_detect(country, "switzerland|Zurich|Switzerlandjust")), "Switzerland"), country = replace(country, which(str_detect(country, "Sydney|Australia|australia")), "Australia"), country = replace(country, which(str_detect(country, "Spain|Barcelona")), "Spain"), country = replace(country, which(str_detect(country, "Soweto|Southy|South africa|Johannesburg")), "South Africa")) %>% mutate (country = replace(country, which(country == "United"), "USA"))
Subsequently, I created an additional column, hour,
with information about the date and hour that each tweet was posted to better understand the relative size of each country at every hour. The dataset was also re-arranged by the date posted.
tweet_clean <- clean_tweet %>% mutate (hour = paste0(months(as.Date(updt_tweet$created_at)), " ", day(created_at), ": ", hour(as_datetime(created_at)), "Hrs")) %>% arrange ((as_datetime(created_at))) levels <- as.character(unique(tweet_clean$hour))
We can create a bar graph to show the distribution using the code chunk below. First, the distribution was arranged in descending order, and we could select the top 20 counties. We can also create an additional column (pos)
for the position of the data labels on the graph.
migr1_plot <- tweet_clean %>% select(country) %>% count(country, sort = TRUE) %>% slice (1:20) %>% mutate (pos = ifelse((n > 200), n-200, n+100)) %>% ggplot() + geom_col(aes(y = reorder(country, n), x = n), fill = c("#003f5c", "#bc5090", rep("#003f5c", 18))) + geom_text (aes(y = country, x = pos, label = n), color = c(rep("#ffffff",7), rep("#000000",13)), family = axis_text, fontface = "bold", vjust = 0.5, hjust = 0, size = 12) + labs (x = "Number of Tweets", y = "", subtitle = "Top 20 desired countries of \ndestination by Nigerians on Twitter") + theme_minimal (base_size = 38, base_family = axis_text) + theme (legend.position = "none", plot.subtitle = element_text(lineheight = unit(0.4, "pt"), size = 45), panel.grid = element_line(colour = NULL), panel.grid.major.y = element_line(colour = "#D2D2D2", linetype = "dashed", size = 0.3), panel.grid.major.x = element_line(colour = "#D2D2D2", linetype = "dashed", size = 0.3), # panel.grid.major.x = element_blank(), panel.grid.minor = element_blank()) + scale_x_continuous(labels = scales::comma) migr1_plot ggsave(file="top_country_tweet.png", dpi=350, height= 7, width= 7)
The figure below shows the top 20 countries from which Twitter users from Nigeria were reportedly tweeting. The figure shows that the United States of America, the United Kingdom, and Canada are Nigerians’ most preferred international destinations on Twitter.
South Africa and Ghana were the only African countries in the top 20 preferred destinations. It also emerged that a significant number of users reportedly tweeted from Nigeria – an act that could be interpreted as being in solidarity with the government.
Data wrangling – UN migration stock
As mentioned previously, we can compare the trends observed on Twitter with those from traditional data sources such as the UN migration stock. Among other variables in the dataset, I retained and renamed the year, country of destination and counts of moves from Nigeria.
#International migrant stock 2019 migration <- migration_UN %>% select (year = ...1, country = ...3, code = ...5, Nigeria) %>% filter (Nigeria != ".." & Nigeria != "" & !is.na(Nigeria)) %>% mutate (Nigeria = as.numeric(Nigeria))
I also retrieved the total number of migrants from Nigeria in 2019 using the country code and reference year columns. I also retrieved and assigned all regional grouping of countries to a vector.
migration$Nigeria[migration$code == "900" & migration$year == "2019"] %>% as.numeric() -> mt mt[1] ## [1] 1438331 regions <- migration$country[1:19]
Lastly, I filtered the data based on a set of conditions
-
Excluded moves to regions such as “Europe”, “More developed regions”, “Africa” and others to focus more on moves to other countries.
-
Excluded rows that include any regional classification such as “Western Africa” or different capitalisation formats for the regions such as Europe or EUROPE.
-
Retained the most recent (2019) counts of migration.
Subsequently, I calculated the percentage of moves from Nigeria to each country and ranked these to obtain the top 20 countries of destination for Nigerians.
migration_updt <- migration %>% filter (!country %in% regions) %>% filter (!str_detect(country, "Western Africa|Southern Africa|Middle Africa|Northern Africa")) %>% filter (!str_detect(country, "Europe|EUROPE|Asia|ASIA|OCEANIA|NORTHERN AMERICA")) %>% filter (year == 2019) %>% mutate (perc = round(Nigeria/mt[1], 3)) %>% arrange (desc(Nigeria)) %>% slice (1:20) %>% mutate (country = replace(country, which(country == "United States of America"), "USA"))
We can also create a bar graph to show the distribution of the top 20 countries that Nigerians migrated to in 2019 using the code chunk below.
migr2_plot <- migration_updt %>% mutate (percentage = paste0((perc * 100), "%")) %>% ggplot() + geom_col(aes(y = reorder(country, perc), x = perc), fill = "#003f5c") + geom_text (aes(y = country, x = perc + 0.01, label = percentage), color = "#000000", family = axis_text, fontface = "bold", vjust = 0.5, hjust = 0, size = 12) + labs (x = "Number of Tweets", y = "", subtitle = "Top 20 destination countries \nfor Nigerian migrants in 2019") + theme_minimal (base_size = 38, base_family = axis_text) + theme (legend.position = "none", plot.subtitle = element_text(lineheight = unit(0.4, "pt"), size = 45), panel.grid = element_line(colour = NULL), panel.grid.major.y = element_line(colour = "#D2D2D2", linetype = "dashed", size = 0.3), panel.grid.major.x = element_line(colour = "#D2D2D2", linetype = "dashed", size = 0.3), # panel.grid.major.x = element_blank(), panel.grid.minor = element_blank()) + scale_x_continuous(labels = scales::percent, limits = c(0, 0.3)) migr2_plot ggsave(file="top_countryTweet_UN.png", dpi=350, height= 7, width= 7)
The figure below shows the top 20 countries that Nigerians migrated to in 2019 based on UN estimates. The figure shows that the United States of America, the United Kingdom, Italy, Germany and Canada are the top non-African destination countries for Nigerian migrants. The Republic of Niger, Benin and Ghana were the top African destinations for African migrants. Surprisingly, South Africa was not among the top 10 destination countries.
How do these compare with existing sources of data on migration
For ease of comparison, I used the ggarrange()
function from ggpubr
to combine the two graphs into one for ease of comparison.
ggarrange(migr1_plot, migr2_plot, nrow = 1) ggsave(file="top_countryMig.png", dpi=350, height= 6, width= 9)
As shown in the figure below, the patterns observed in the tweets closely match the migration patterns observed in 2019 – Nigerians mostly want to [and] migrate to the USA and UK. Germany and Canada were also among the top five preferred destination countries for Nigerian migrants.
Although the two data sets were collected/estimated at separate times, the analysis reveals some discrepancies between migration intention and actual migrant movements. For example, regional migration flows (with Africa) was not prominent in the Tweet. Compared to the actual migration flows from Nigeria to Cameroon, Niger, Benin, and Ghana in 2019, just a few users desired to travel to South Africa and Ghana.
Contact
If you have any suggestions for improving the tutorial or experience any difficulty with the codes in the tutorial, please use the Contact Form to send me an email, reach me via Twitter: @eOlamijuwon or leave a comment.
The post #TwitterBan and speaking into existence: Evaluating migration intentions in Nigeria appeared first on Emmanuel Ọlámíjùwọ́n | Digital Demographer | Health Researcher | Data Analyst.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.