What does it say about r?
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The last post I show a way to plot a gexf file in R using the rgexf package and the Sigmajs library. Now we need some data to use that piece of code. So I’ve decided obtain the tweets about R. For this I’ve used the twitteR package and search “#rstats”, then clean the texts and extract all the hashtags. Then find the associations between following the next simple rule: if a tweet said: “#rstats and #data are my drugs” this two hashtags are related. Then put some graphics attributes like size of the node according the quantity of mentions, and some random to make the graph more attractive.
Finally make te code run and see the result 😉.
There are many tweets about #python in the #rstats ‘s tweets!. It is obvious see many tweets about data (#data, #bi, #datamining, #bigdata, etc). In other hand there are conversations about #sas, #matlab, and #sastip, and so on.
library(twitteR) library(stringr) library(plyr) library(sna) # Some tweets about R tweets <- tolower(twListToDF(searchTwitter(searchString="#rstats", n=1500))$text) head(tweets) hashtags_remove <- c("#rstats", "#r") # Cleaning the tweets for(term in hashtags_remove) tweets <- gsub(term, "", tweets) # Extract the hastags hashtags <- unique(unlist(str_extract_all(tolower(tweets), "#\\w+"))) hashtags <- setdiff(hashtags, hashtags_remove) # Capture the node size according the amount that appear nodesizes <- laply(hashtags, function(hashtag){ sum(grepl(hashtag, tweets)) }) # scaling sizes nodesizes <- 1 + log(nodesizes, base = 3) nodes <- data.frame(id = c(1:length(hashtags)), label = hashtags, stringsAsFactors=F) # relations <- ldply(hashtags, function(hashtag){ hashtag_related <- unlist(str_extract_all(tweets[grepl(hashtag, tweets)], "#\\w+")) hashtag_related <- setdiff(hashtag_related, hashtag) if(length(hashtag_related)==0){ return(data.frame()) } data.frame(source = which(hashtags==hashtag), target = which(hashtags %in% hashtag_related)) }) # Is an undirected graph! So remove the duplicates for(row in 1:nrow(relations)){ relations[row,] <- sort(relations[row,]) } relations <- unique(relations) # Some colors nodecolors <- data.frame(r = sample(1:249, size = nrow(nodes), replace=T), g = sample(1:249, size = nrow(nodes), replace=T), b = sample(1:249, size = nrow(nodes), replace=T), a = 1) links <- matrix(rep(0, length(hashtags)^2), ncol = length(hashtags)) for(edge in 1:nrow(relations)){ links[(relations[edge,]$target), (relations[edge,]$source)] <- 1 } positions <- gplot.layout.kamadakawai(links, layout.par=list()) positions <- cbind(positions, 0) # needs a z axis graph <- write.gexf(nodes=nodes, edges=relations, nodesVizAtt=list( color=nodecolors, size=nodesizes, position=positions)) plot.gexf(graph)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.