[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The first problem I got was because installing twitteR on Ubuntu is not that simple ! You have to install properly RCurl… But you before install the package in R, it is necessary to run the following line in a terminal
$ sudo apt-get install libcurl4-gnutls-dev
then, launch R
$ R
and then you can run the standard
> install.packages("RCurl")
and install finally the package of interest,
> install.packages("twitteR")
Then, the second problem I had was that twitteR has been updated recently because of Twitter’s new API. Now, you should register on Twitter’s developers webpage, get an Id and a password, then use it in the following function (I did change both of them, below, so if you try to run the following code, you will – probably – get an error message),
> library(twitteR) > cred <- getTwitterOAuth("ikzCtYif9Rwoood45w","rsCCifp99kw5sJfKfOUhhwyVmPl9A") > registerTwitterOAuth(cred) [1] TRUE > T <- userTimeline('freakonometrics',n=5000)
you should also go on some webpage and enter a PIN that you find online.
To enable the connection, please direct your web browser to: http://api.twitter.com/oauth/authorize?oauth_token=cQaDmxGe... When complete, record the PIN given to you and provide it here:
It is a pain in ass, trust me. Anyway, I have be able to run it. I can now have the list with all my (recent) tweets
> T <- userTimeline('freakonometrics',n=5000)
Now, my (third) problem was to extract from my tweets the url of references. The second tweet of the list was
- [textmining] “How a Computer Program Helped Reveal J. K. Rowling as Author of A Cuckoo’s Calling” scientificamerican.com/article.cfm?id… by @garethideas
But when you look at the text, you see
> T[[2]] [1] "freakonometrics: [textmining] \"How a Computer Program Helped Reveal J. K. Rowling as Author of A Cuckoos Calling\" http://t.co/wdmBGL8cmj by @garethideas"
So what I get is not the url used in my tweet, but a shortcut to the urls, from http://t.co/. Hopefully, @3wen (as always) has been able to help me with the following functions,
> extraire <- function(entree,motif){ + res <- regexec(motif,entree) + if(length(res[[1]])==2){ + debut <- (res[[1]])[2] + fin <- debut+(attr(res[[1]],"match.length"))[2]-1 + return(substr(entree,debut,fin)) + }else return(NA)} > unshorten <- function(url){ + uri <- getURL(url, header=TRUE, nobody=TRUE, followlocation=FALSE, + cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl")) + res <- try(extraire(uri,"\r\nlocation: (.*?)\r\nserver")) + return(res)}
Now, if we use those functions, we can get the true url,
> url <- "http://t.co/wdmBGL8cmj" > unshorten(url) [1] http://www.scientificamerican.com/article.cfm?id=how-a-computer-program-helped-show..
Now I can play with my list, to extract urls, and the address of the website,
> exturl <- function(i){ + text_tw <- T_text[i] + locunshort2 <- NULL + indtext <- which(substr(unlist(strsplit(text_tw, " ")),1,4)=="http") + if(length(indtext)>0){ + loc <- unlist(strsplit(text_tw, " "))[indtext] + locunshort=unshorten(loc) + if(is.na(locunshort)==FALSE){ + locunshort2 <- unlist(strsplit(locunshort, "/"))[3]}} + return(locunshort2)}
Using apply with this function, and my list, and counting using a simple table() function, I can see that my top four (over more than 900 tweets) of reference websites is the following:
www.nytimes.com www.guardian.co.uk 19 21 www.washingtonpost.com www.lemonde.fr 21 22
Nice, isn’t it ?
To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.