[This article was first published on chem-bla-ics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We are using a Semantic MediaWiki (SMW) for the Gold Compound selection task by the ToxBank in the SEURAT-1 cluster, funded by Colipa and the EC. I do stress that despite being funded by Colipa, they have no control over my research; they just co-fund it. This Gold Compound wiki is hidden behind a cluster agreement-wall, which is implemented with HTTP Basic Auth on the front, and LDAP authentication (at some point this data will become Open) in the background. That actually combines nicely with (S)MW, and automatically logs in people into their (linked) wiki account.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Now, the great thing about SMW is that it is machine readable. It basically allows you a custom DBPedia, and I am using this to capture knowledge from the NanoQSAR literature, as blogged in Importing Nanotoxicity Data with SPARQL into R for analysis. It turned out that the SMW wiki is simply using ‘basic HTTP authentication’ for the part between web server and web client (thanx to chats with Nina), and LDAP between web server and authentication server. That meant that doing the authentication in Jena was trivial too, and I could simply use QueryEngineHTTP.setBasicAuthentication().
I updated rrdf to version 1.5 to support this too (see this patch; and thanx to Kurt Hornik for taking care of the CRAN incoming/). This mean that I can now extract Gold Compound data directly into my favorite statistics software R (but it would equally work with other tools that have SPARQL support, like Bioclipse), and do all sorts of fun stuff with the data, like validation, consistency checking, data mining, you name it. Like plotting pKa values (with rather uninformative segments :):
The wiki uses the RDFIO extension for SMW written by my former M.Sc. student at Uppsala University, Samuel, who presented this module at SMWCon last week.
SEURAT-1 GCWG and ToxBank members can email me for details on how to link the Gold Compound wiki to R (or other software). But it basically comes down to running command like this:
library(rrdf)
predicates = sparql.remote(
someSemanticWiki,
“SELECT DISTINCT ?predication WHERE { [] ?predicate [] }”,
user=”user”, password=”password”
)
To leave a comment for the author, please follow the link and comment on their blog: chem-bla-ics.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.