The echo of a tragedy in Social Media – “The Making of”
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In this post I’m going to describe how I pull together the visualization of The echo of a tragedy in Social Media.
I used 3 different technologies to pull it together: PHP for data gathering, R for data analysis and data output generation and D3.js for data visualization.
The day after the incidents in the Charlie Hebdo took place, I set up a Twitter monitor using the Search API based on the hashtag #jesuischarlie for 5 different languages: German, Spanish, English, Italian and of course French. I made sure for each language I had at least 10K Tweets gathered. I made available the data as csv (jesuischarlie.csv). The PHP part is easy so I’m not going to focus on it
The analysis part is no rocket science and is accomplished in following steps:
- Counting the occurrences of all available hashtags per language
- Taking the top 50 hashtags per language
- Computing the occurrences of hashtags pairs within the selected hashtags
- Generating the JSON according to the directed forced D3.js layout
The code taking care of the visualization is written in D3.js, embedded and readable in the visualization page.
A few comments on that:
- There are 2 functions to make both text and opacity dependent on the number of occurrences
- The nodes accept a click to highlight the connections to other nodes
- The color of the highlighted connections change according to a round robin approach to allow for multiple highlighting
- The values for gravity, distance and charge have been manually adjusted… worth spending some time playing around with them
- The size of the images for the nodes is determined by a function to allow bigger sizes for the flags
Below you can find the R code… Again, it might not be 100% optimized, but does the job… Give it a go!
library(RMySQL) require(data.table) library(stringr) con <- dbConnect(MySQL(), user="XXXXXX", password="XXXXXX",dbname="XXXXXXX", host="XXXXXXXX") res=dbSendQuery(conn=con,"SELECT * FROM domaintweets where tag='#JeSuisCharlie'") assets<-fetch(res,-1) dbDisconnect(con) # I've done it with a data based... you can do it with the data set I provided above with a read.csv statement # assets<-read.csv("~/jesuischarlie/jesuischarlie.csv") # languages we extracted tweets for languages<-c("es","de","en","fr","it") countPerLanguage<-function (assets, max) { assets$ntags<-str_count(assets$tags, ",") #filtering out tweets with way too many hashtags assets<-assets[assets$ntags<5,] assets<-head(assets,max) tags<-assets$tags alltags<-paste(tags, collapse = ',') alltags<-gsub(x=alltags, pattern = ',,',replacement = ',') all.tags<-strsplit(x = alltags,split = ',') df.all.tags<- as.data.frame(all.tags) colnames(df.all.tags)<-c("tag") df.all.tags$tag<-as.character(df.all.tags$tag) df.all.tags$tag<-tolower(df.all.tags$tag) df.all.tags$count<-1 agg.df.all.tags<-aggregate(count~tag, df.all.tags, length) agg.df.all.tags<-agg.df.all.tags[order(agg.df.all.tags$count, decreasing = T),] return (agg.df.all.tags) } # taking the top50 hashtags per languages agg.languages<-NULL for (i in 1:length(languages)) { df.lang<-subset(assets, lang==languages[i]) top50 <- countPerLanguage(df.lang, 10000) lang<-rep(x = languages[i],times = nrow(top50)) top50$lang<-lang top50<-head(top50,50) agg.languages<-rbind(agg.languages,top50) } # hashtag 2 hashtag connection unique.tags<-unique(agg.languages$tag) pairs.df<-NULL for (i in 2:length(unique.tags)) { tag1 <- unique.tags[i] assets.tag <- subset(assets, grepl(pattern = paste0("^",tag1,",", "|",",", tag1,",") ,x = assets$tags) ) for (j in i:length(unique.tags)) { if (i!=j) { tag2<-unique.tags[j] print(paste(tag1, "-", tag2)) pairs <- subset(assets.tag, grepl(pattern = paste0("^",tag2,",", "|",",", tag2,",") ,x = assets.tag$tags) ) count.pairs<-nrow(pairs) if (count.pairs!= 0) { l<-list(unique.tags[i],unique.tags[j],count.pairs) df<-as.data.frame(l) colnames(df)<-c('tag1','tag2','Freq') pairs.df<-rbind(pairs.df,df) } } } } pairs.df$tag1<-as.character(pairs.df$tag1) pairs.df$tag2<-as.character(pairs.df$tag2) # Generating JSON compatible with the force-directed graph http://bl.ocks.org/mbostock/4062045 languages<-c("de","en","it","fr","es") languages.img<-c("https://cdn3.iconfinder.com/data/icons/finalflags/32/Germany-Flag.png", "https://cdn3.iconfinder.com/data/icons/finalflags/32/United-Kingdom-flag.png", "https://cdn3.iconfinder.com/data/icons/finalflags/32/Italy-Flag.png", "https://cdn3.iconfinder.com/data/icons/finalflags/32/France-Flag.png", "https://cdn3.iconfinder.com/data/icons/finalflags/32/Spain-Flag.png") img.url<-"https://cdn4.iconfinder.com/data/icons/miu/22/editor_pencil_pen_edit_write_-16.png" listNodes<-NULL nodeNr<-0 # Genereting the Nodes strJson<-'{ "nodes":[' for(j in 1:length(languages)) { lang<-agg.languages[agg.languages$lang==languages[j],] df<-data.frame(tag=languages[j],noderNr=nodeNr) listNodes<-rbind(df,listNodes) # entry for the language dt<-paste0('{"name":"', languages[j],'","group":',j,', "image":"',languages.img[j] ,'", "count":', sum(lang$count) ,', "number":', nodeNr ,' },' ) nodeNr<-nodeNr+1 strJson<-paste0(strJson,dt) for(i in 1:nrow(lang)) { if (nrow(listNodes[listNodes$tag==lang[i,]$tag,])==0 && !(lang[i,]$tag %in% languages)) { dt<-paste0('{"name":"', lang[i,]$tag,'","group":',j,', "image":"',img.url ,'", "count":', lang[i,]$count ,', "number":', nodeNr ,' },' ) strJson<-paste0(strJson,dt) df<-data.frame(tag=lang[i,]$tag,noderNr=nodeNr) listNodes<-rbind(df,listNodes) nodeNr<-nodeNr+1 } } } strJson<-substr(strJson,1,nchar(strJson)-1) # Genereting the Links strJson<-paste0(strJson,'], "links": [') for (l in 1:nrow(agg.languages)) { matches1<-subset(listNodes, tag==agg.languages[l,]$lang) matches2<-subset(listNodes, tag==agg.languages[l,]$tag) for (k in 1:nrow(matches1)) { for (m in 1:nrow(matches2)) { dt<-paste0('{"source":',matches1[k,]$noderNr,',"target":',matches2[m,]$noderNr,',"value": 0},') strJson<-paste0(strJson,dt) } } } for (l in 1:nrow(pairs.df)) { matches1<-subset(listNodes, tag==pairs.df[l,]$tag1) matches2<-subset(listNodes, tag==pairs.df[l,]$tag2) for (k in 1:nrow(matches1)) { for (m in 1:nrow(matches2)) { dt<-paste0('{"source":',matches1[k,]$noderNr,',"target":',matches2[m,]$noderNr,',"value": 0},') strJson<-paste0(strJson,dt) } } } strJson<-substr(strJson,1,nchar(strJson)-1) strJson<-paste0(strJson,']}') # Writing it to a file write(x = strJson, file = '~/jesuischarlie/jesuischarlie2.json') |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.