Site icon R-bloggers

The new science journalism and open science

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

by Joseph Rickert

The New York Times is quietly changing the practice of science journalism. The Tuesday April 21, 2015 article: Ebola Lying in Wait, reports on "A growing body of scientific clues – some ambiguous, other substantive" that the Ebola virus may have lain dormant in West African rain forest for years before igniting last year's outbreak. In the 6th paragraph of the on-line edition mention is made of "a detailed prediction of other likely Ebola dangers zones" made by a team of scientists. The words "detailed prediction" are innocuously provided with the hyper-link above. What I think is extraordinary is that this link points to the scientific paper: Mapping the zoonotic niche of Ebola virus disease in Africa by David M Piggot et al. published on the open science publishing platform eLife. This is the real science including the measured language of a scientific paper, the lengthly descriptions of the data sets, the innumerable references and even the reviewers comments and the authors' responses. I don't think that there is a better way to cultivate a scientific outlook than to make relevant, science open and accessible. 

The following figure from the paper, illustrates one of the low level tools of open science: the digital object identifier (DOI). A DOI is a character string that uniquely identifies a document, or other digital object, that is meant to persist for the lifetime of the document.

 

 

The paper by Piggot et al. is replete with DOIs pointing to subsections of the document, figures and other documents.

The next step for data science along these lines is to use DOIs and other tools to make it easy to search eLife, Plos, Crossref, Entrez and other open science platforms. Towards this goal, the team at rOpenSci is well on their way. With limited resources, they have developed an impressive array of R packages for accessing public data as well as for searching the scientific literature. The code below shows some of my own early efforts to use rOpenSci functions to search the literature. rplos is a mature package available on CRAN. The fulltext package is under development. When finished, it will offer functions for working with multiple open science publishers.

To go further have a look at the rOpenSci tutorials. However, interest in text mining aside, I think we should be grateful for the efforts of the PLOS, eLife and the other open science publishers, rOpenSci and the New York Times.

# Get started with rOpenSci text searches
# Install the library full text
devtools::install_github(c("ropensci/rplos", "ropensci/bmc", "ropensci/aRxiv", "emhart/biorxiv"))
devtools::install_github("ropensci/fulltext")
library("fulltext")
 
library(rplos)
 
DOI <- ft_search(query="ebola")
DOI
 
# Query:
#   [ebola] 
# Found:
#   [PLoS: 668; BMC: 0; Crossref: 0; Entrez: 0; arxiv: 0; biorxiv: 0] 
# Returned:
#   [PLoS: 10; BMC: 0; Crossref: 0; Entrez: 0; arxiv: 0; biorxiv: 0] 
 
str(DOI)
# List of 6
# $ plos    :List of 4
# ..$ found  : int 668
# ..$ data   :'data.frame':  10 obs. of  1 variable:
#   .. ..$ id: chr [1:10] "10.1371/journal.pcbi.1004087" "10.1371/journal.pone.0037106" "10.1371/journal.pmed.0010059" "10.1371/journal.pntd.0003706" ...
# ..$ opts   :List of 2
# .. ..$ q    : chr "ebola"
# .. ..$ limit: num 10
# ..$ license:List of 3
# .. ..$ type: chr "CC-BY"
# .. ..$ uri : chr "http://creativecommons.org/licenses/by/4.0/"
# .. ..$ text: chr "<authors> This is an open-access article distributed under n                              the terms of the Creative Commons At"| __truncated__
# ..- attr(*, "class")= chr "ft_ind"
# ..- attr(*, "query")= chr "ebola"
# $ bmc     :List of 3
# ..$ found: NULL
# ..$ data : NULL
# ..$ opts : list()
# ..- attr(*, "class")= chr "ft_ind"
# ..- attr(*, "query")= chr "ebola"
# $ crossref:List of 3
# ..$ found: NULL
# ..$ data : NULL
# ..$ opts : list()
# ..- attr(*, "class")= chr "ft_ind"
# ..- attr(*, "query")= chr "ebola"
# $ entrez  :List of 3
# ..$ found: NULL
# ..$ data : NULL
# ..$ opts : list()
# ..- attr(*, "class")= chr "ft_ind"
# ..- attr(*, "query")= chr "ebola"
# $ arxiv   :List of 3
# ..$ found: NULL
# ..$ data : NULL
# ..$ opts : list()
# ..- attr(*, "class")= chr "ft_ind"
# ..- attr(*, "query")= chr "ebola"
# $ biorxiv :List of 3
# ..$ found: NULL
# ..$ data : NULL
# ..$ opts : list()
# ..- attr(*, "class")= chr "ft_ind"
# ..- attr(*, "query")= chr "ebola"
# - attr(*, "class")= chr "ft"
# - attr(*, "query")= chr "ebola"
 
 
article <- DOI$plos[[2]][[1]][4]    #Fetch the DOI from the 4th PLOS article
# "10.1371/journal.pntd.0003706"
 
URL <- full_text_urls(doi=article)  #Fetch the full URL
URL
#[1] "http://www.plosntds.org/article/fetchObject.action?uri=info:doi/10.1371/journal.pntd.0003706&representation=XML"
 
text <- plos_fulltext(doi=article)  #Fetch the XML text
text[[1]]

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.