SNA: Visualising an email box with R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Are statistics sexy? Visualising social networks certainly is! I wrote a little function, which makes producing beautiful plots depicting a mailbox with R an extremely easy task. I find visualisations of ‘social graphs’ particularly appealing. They look like flowers.
I had to use a few Python functions which can be executed within R with rJython library. The function connects to IMAP server and looks for “To:” and “From:” sections in stored emails. It should not be difficult to adapt this script to work with POP3 too. I am really impressed by what R can do (with a little bit of help from Python). Can anyone suggest a more elegant way to do the same thing without executing Python?
As rJython depends on rJava I had to install Java Development kit to launch it.
Warning: For me this function worked very well and did not do any harm to my mailbox. Despite that I am not an expert in IMAP so if you are going to run it you are doing it at your own risk.
Here is the function:
mailSoc <- function(login, pass, serv = "imap.gmail.com", #specify IMAP server ntore = 50, #ignore if addressed to more than todow = -1, #how many to download begin = -1){ #from which to start #load rJython and Python libraries require(rJython) rJython <- rJython(modules = "imaplib") rJython$exec("import imaplib") #connect to server rJython$exec(paste("mymail = imaplib.IMAP4_SSL('", serv, "')", sep = "")) rJython$exec(paste("mymail.login(\'", login, "\',\'", pass, "\')", sep = "")) #get number of available messages rJython$exec("sel = mymail.select()") rJython$exec("number = sel[1]") nofmsg <- .jstrVal(rJython$get("number")) nofmsg <- as.numeric(unlist(strsplit(nofmsg, "'"))[2]) #if 'begin' not specified begin from the newest if(begin == -1) { begin <- nofmsg } #if 'todow' not specified download all if(todow == -1) { end <- 1 } else { end <- begin - todow } #give a little bit of information todownload <- begin - end print(paste("Found", nofmsg, "emails")) print(paste("I will download", todownload, "messages.")) print("It can take a while") data <- data.frame() #fetching emails for (i in begin:end) { nr <- as.character(i) #get sender rJython$exec(paste("typ, fro = mymail.fetch(\'", nr, "\', \'(BODY[HEADER.FIELDS (from)])\')", sep = "")) rJython$exec("fro = fro[0][1]") from <- .jstrVal(rJython$get("fro")) from <- unlist(strsplit(from, "[<>\r\n, \"]")) from <- sub("from: ", "", from, ignore.case = TRUE) from <- grep("@", from, value = TRUE) #get addresees rJython$exec(paste("typ, to = mymail.fetch(\'", nr, "\', \'(BODY[HEADER.FIELDS (to)])\')", sep = "")) rJython$exec("to = to[0][1]") to <- .jstrVal(rJython$get("to")) to <- unlist(strsplit(to, "[<>\r\n, \"]")) to <- sub("to: ", "", to, ignore.case = TRUE) from <- sub("\"", "", from, ignore.case = TRUE) to <- grep("@", to, value = TRUE) #if reasonable number of addressses add to data frame if(length(to) <= ntore){ vec <- rep(from, length(to)) data <- rbind(data, data.frame(vec, to)) } #give some information about progress if((i - begin) %% 100 == 0) { print(paste((i - begin)*(-1), "/", todownload, " Downloading...", sep = "")) } } names(data) <- c("from", "to") data$from <- tolower(data$from) data$to <- tolower(data$to) #close connection rJython$exec("mymail.shutdown()") return(data) }
Now we can run eg.
#download 200 most recent emails from gmail account maild <- mailSoc("login", "password", serv = "imap.gmail.com", ntore = 40, todow = 200)
And to make a plot it is necessary to load network library
library(network) mailnet <- network(maild) plot(maild)
This is the result:
R provides many other social network analysis tools such as igraph library. For instance, it can be used to make an interactive ‘plot’:
library(igraph) h <- graph.data.frame(maild, directed = FALSE) tkplot(h, vertex.label = V(h)$name, layout=layout.fruchterman.reingold)
I would like to learn more about SNA as well as I would like to try out Gephi which can produce visualisations which are even more attractive than those made in R so I think that I will write about my first impression soon.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.