[This article was first published on Recology - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
So I want to mine some #altmetrics data for some research I’m thinking about doing. The steps would be:
- Get journal titles for ecology and evolution journals.
- Get DOI’s for all papers in all the above journal titles.
- Get altmetrics data on each DOI.
- Do some fancy analyses.
- Make som pretty figs.
- Write up results.
It’s early days, so jus working on the first step. However, getting a list of journals in ecology and evolution is frustratingly hard. This turns out to not be that easy if you are (1) trying to avoid Thomson Reuters, and (2) want a machine interface way to do it (read: API).
Unfortunately, Mendeley’s API does not have methods for getting a list of journals by field, or at least I don’t know how to do it using their API. No worries though – Crossref comes to save the day. Here’s my attempt at this using the Crossref OAI-PMH.
I wrote a little while loop to get journal titles from the Crossref OAI-PMH. This takes a while to run, but at least it works on my machine – hopefully yours too!
library(XML) library(RCurl) token <- "characters" # define a iterator, also used for gettingn the resumptionToken nameslist <- list() # define empty list to put joural titles in to while (is.character(token) == TRUE) { baseurl <- "http://oai.crossref.org/OAIHandler?verb=ListSets" if (token == "characters") { tok2 <- NULL } else { tok2 <- paste("&resumptionToken=", token, sep = "") } query <- paste(baseurl, tok2, sep = "") crsets <- xmlToList(xmlParse(getURL(query))) names <- as.character(sapply(crsets[[4]], function(x) x[["setName"]])) nameslist[[token]] <- names if (class(try(crsets[[2]]$.attrs[["resumptionToken"]])) == "try-error") { stop("no more data") } else token <- crsets[[2]]$.attrs[["resumptionToken"]] }
Yay! Hopefully it worked if you tried it. Let’s see how long the list of journal titles is.
sapply(nameslist, length) # length of each list
characters c65ebc3f-b540-4672-9c00-f3135bf849e3 10001 10001 6f61b343-a8f4-48f1-8297-c6f6909ca7f7 6864
allnames <- do.call(c, nameslist) # combine to list length(allnames)
[1] 26866
Now, let’s use some regex
to pull out the journal titles that are likely ecology and evolutionary biology journals. The ^
symbol says “the string must start here”. The \\s
means whitespace. The []
lets you specify a set of letters you are looking for, e.g., [Ee]
means capital E
OR lowercase e
. I threw in titles that had the words systematic and natrualist too. Tried to trim any whitespace as well using the stringr
package.
library(stringr) ecotitles <- as.character(allnames[str_detect(allnames, "^[Ee]cology|\\s[Ee]cology")]) evotitles <- as.character(allnames[str_detect(allnames, "^[Ee]volution|\\s[Ee]volution")]) systtitles <- as.character(allnames[str_detect(allnames, "^[Ss]ystematic|\\s[Ss]systematic")]) naturalist <- as.character(allnames[str_detect(allnames, "[Nn]aturalist")]) ecoevotitles <- unique(c(ecotitles, evotitles, systtitles, naturalist)) # combine to list ecoevotitles <- str_trim(ecoevotitles, side = "both") # trim whitespace, if any length(ecoevotitles)
[1] 188
# Just the first ten titles ecoevotitles[1:10]
[1] "Microbial Ecology in Health and Disease" [2] "Population Ecology" [3] "Researches on Population Ecology" [4] "Behavioral Ecology and Sociobiology" [5] "Microbial Ecology" [6] "Biochemical Systematics and Ecology" [7] "FEMS Microbiology Ecology" [8] "Journal of Experimental Marine Biology and Ecology" [9] "Applied Soil Ecology" [10] "Forest Ecology and Management"
Get the .Rmd file used to create this post at my github account.
Written in Markdown, with help from knitr, and nice knitr highlighting/etc. in in RStudio.
To leave a comment for the author, please follow the link and comment on their blog: Recology - R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.