[This article was first published on Rcrastinate, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A few days ago, I collected 30 minutes of tweets all around the world. I used the twitteR and streamR packages for this. The nice thing about those tweets is that they have geo-information associated with them. Not all of them, of course, but more than enough.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Let’s see, what happens if we plot all the tweets at their respective position.
(this is a large PNG file, please click to enlarge)
Isn’t this awesome?! It’s a map of the world – without using any map packages or stuff like that. Only half an hour of tweets. Of course, some parts of the world are seriously underrepresented like Africa and Australia.
Where can we go from here? We have lots of information available. For example, we can use the time stamp to see the world map “emerge” over time. So, here’s a GIF also showing the running second from when I began to collect the tweets. Please note that I made larger steps at the end.
(5M gif, please click to enlarge)
There is another information associated with each tweet. It’s the language of the tweet according to Twitter’s classification algorithm. Let’s see how it looks if we assign a color to each of the TOP10 languages within these 30 minutes.
(click to enlarge)
And a high resolution version of the same plot (for zooming and stuff).
(high res version, click to enlarge)
Of course, we can have a look at different parts of the world.
Europe (click to enlarge).
The UK
(please refer to the legends in the other plots to know which color is which langauge).
North America.
South America.
[EDIT: Shiny App not online anymore]
Finally, the nice folks over at shinyapps.io let you host your own Shiny application. Here, you can choose one or more of the TOP10 languages and the Shiny App creates your own map.
Here is the R code for this post. If you are interested in the code of the Shiny App, please let me know.
# Loading packages
library(streamR)
require(twitteR)
# Twitter authentification, I saved my credentials in twitter_auth.Rdata.
options(RCurlOptions = list(cainfo = system.file(“CurlSSL”, “cacert.pem”, package = “RCurl”)))
load(“twitter_auth.Rdata”)
registerTwitterOAuth(cred)
# Getting tweets by accessing Twitter’s stream
# Parameter ‘locations’ is set to the whole world
# Parameter ‘language’ is set to all languages
# Tweets are written to the file ‘tweets.txt’
# My file with collected tweets is approx. 250M
filterStream(“tweets.txt”, locations = c(-180, -90, 180, 90),
timeout = 1800, oauth = cred, verbose = F, language = “”)
# Parse tweets and select only relevant columns
parsed.tw <- parseTweets(“tweets.txt”)
tw <- parsed.tw[,c(“text”, “lang”, “lat”, “lon”, “created_at”)]
# Extract hour, minute and second the tweet was created
tw$hour <- substr(tw$created_at, 12, 13)
tw$minute <- substr(tw$created_at, 15, 16)
tw$second <- substr(tw$created_at, 18, 19)
# Running second (starting, in my case, from 17:14)
tw$second.counting <- ((as.numeric(tw$minute) – 14) * 60) + as.numeric(tw$second)
# Writing single image files for GIF
par(mar = c(0,0,0,0))
# Creating vector with seconds to output,
# starting with steps of size 1, then 5, then 10
secs <- c(min(tw$second.counting):400,
seq(401, 1000, 5),
seq(1001, max(tw$second.counting), 10))
# Outputting single images
for (sec.i in secs) {
if (sec.i %% 400 == 0) cat(sec.i, “\n”)
second.tweets <- tw[tw$second.counting <= sec.i,]
png(paste0(“tw.sec”, sprintf(“%04d”, sec.i), “.png”), width = 600, height = 400)
plot(second.tweets$lon, second.tweets$lat,
pch = 19, bty = “n”, xlab = “”, ylab = “”, xaxt = “n”, yaxt = “n”,
cex = .2, ylim = range(tw$lat, na.rm = T), xlim = range(tw$lon, na.rm = T))
mtext(sec.i, side = 3, line = -1, cex = .9)
dev.off()
}
# Creating GIF
system(“convert -delay 10 *.png tweets3.gif”)
# Assigning colors to languages
tw$plot.col <- ifelse(tw$lang == “en”, “blue”,
ifelse(tw$lang == “es”, “red”,
ifelse(tw$lang == “pt”, “green”,
ifelse(tw$lang == “tr”, “orange”,
ifelse(tw$lang == “und”, “grey”,
ifelse(tw$lang == “in”, “hotpink”,
ifelse(tw$lang == “ar”, “brown”,
ifelse(tw$lang == “fr”, “peachpuff”,
ifelse(tw$lang == “ru”, “yellow”,
ifelse(tw$lang == “ja”, “gold”, “black”))))))))))
# Plotting all tweets, 1 dot per tweet, different colors for languages
# Below, there are all plots for the different regions. They are simply created
# by using different ylim and xlim values! Those xlim and ylim values
# correspond to latitude and longitude values.
png(“world.of.tweets.highres.png”, height = 5800, width = 10000, res = 600)
par(mar=rep(0,4))
plot(tw$lon, tw$lat, col = tw$plot.col, pch = “.”, xaxt = “n”, yaxt = “n”, bty = “n”)
legend(“bottomleft”, legend = c(“English”, “Spanish”, “Portuguese”, “Turkish”,
“Undefined?”, “Indonesian”, “Arabic”, “French”, “Russian”,
“Japanese”, “Other”),
col = c(“blue”, “red”, “green”, “orange”,
“grey”, “hotpink”, “brown”, “peachpuff”,
“yellow”, “gold”, “black”),
pch = 19, bty = “n”, inset = 0.1)
dev.off()
png(“world.of.tweets.png”, height = 2000, width = 3500, res = 300)
par(mar=rep(0,4))
plot(tw$lon, tw$lat, col = tw$plot.col, pch = “.”, xaxt = “n”, yaxt = “n”, bty = “n”)
legend(“bottomleft”, legend = c(“English”, “Spanish”, “Portuguese”, “Turkish”,
“Undefined?”, “Indonesian”, “Arabic”, “French”, “Russian”,
“Japanese”, “Other”),
col = c(“blue”, “red”, “green”, “orange”,
“grey”, “hotpink”, “brown”, “peachpuff”,
“yellow”, “gold”, “black”),
pch = 19, bty = “n”, inset = 0.1)
dev.off()
# Europe
png(“europe.of.tweets.png”, height = 2000, width = 3500, res = 300)
par(mar=rep(0,4))
plot(tw$lon, tw$lat, col = tw$plot.col, pch = “.”, xaxt = “n”, yaxt = “n”, bty = “n”,
xlim = c(-24, 48), ylim = c(36, 72))
legend(“bottomleft”, legend = c(“English”, “Spanish”, “Portuguese”, “Turkish”,
“Undefined?”, “Indonesian”, “Arabic”, “French”, “Russian”,
“Japanese”, “Other”),
col = c(“blue”, “red”, “green”, “orange”,
“grey”, “hotpink”, “brown”, “peachpuff”,
“yellow”, “gold”, “black”),
pch = 19, bty = “n”, inset = 0.01)
dev.off()
# UK
png(“UK.of.tweets.png”, height = 2000, width = 1700, res = 300)
par(mar=rep(0,4))
plot(tw$lon, tw$lat, col = tw$plot.col, pch = “.”, xaxt = “n”, yaxt = “n”, bty = “n”,
xlim = c(-12, 3.4), ylim = c(49.4, 59.3))
dev.off()
png(“southamerica.of.tweets.png”, height = 2000, width = 1700, res = 300)
par(mar=rep(0,4))
plot(tw$lon, tw$lat, col = tw$plot.col, pch = “.”, xaxt = “n”, yaxt = “n”, bty = “n”,
xlim = c(-103, -27), ylim = c(-56, 17.5))
dev.off()
png(“northamerica.of.tweets.png”, height = 2000, width = 2500, res = 300)
par(mar=rep(0,4))
plot(tw$lon, tw$lat, col = tw$plot.col, pch = “.”, xaxt = “n”, yaxt = “n”, bty = “n”,
xlim = c(-163, -52), ylim = c(12, 70))
dev.off()
To leave a comment for the author, please follow the link and comment on their blog: Rcrastinate.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.