Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I recently saw a tweet floating by which included a link to some recent statistics from PubMed Commons, the NCBI service for commenting on scientific articles in PubMed. Perhaps it was this post at their blog. So I thought now would be a good time to write some code to analyse PubMed Commons data.
The tl;dr version: here’s the Github repository and the RPubs report.
For further details and some charts, read on.
Currently, there is no access to PubMed Commons data via the NCBI Entrez API aside from a PubMed search filter to return articles that have comments. However, a Google search for “pubmed commons api” returns this useful Gist. It shows how to construct a URL which returns JSON-formatted PubMed Commons data for a given PMID. If Alf is reading this, I’d like to know how he discovered this information gem!
Armed with this I was able to write Ruby code to return all PMIDs with comments, fetch the comment data, parse it and output a summary to a CSV file. I used to be an XPath guy. This experience changed me into a CSS selector guy.
Analysis and visualisation can then be performed using this RMarkdown file. Here are some of the highlights; the RPubs report contains the complete analysis.
According to the PubMed Commons blog, the service has over 10 500 members, so the active participation rate is about what we’d expect from other forums. The fraction of articles is obviously very small, given that there are now close to 27 000 000 PubMed articles.
By contrast, the oldest article with a comment currently comes from 1945.
Included in the RPubs report, but not here, are some density plots to show distributions of comments per article and comments per author. As you might expect, what’s observed most frequently is one comment (per article or author), followed by a “long tail”. You may be interested in the article with the most comments. Currently it’s an editorial titled “When Is Science Ultimately Unreliable?“: you can decide for yourself why it is creating debate. You may also be interested in the most prolific comment authors; I was not, so that’s left as an exercise for those interested – the CSV file is available.
Summary
In my opinion, PubMed Commons is a valuable and reasonably-successful service. It’s obviously something of a “niche” online forum and is never going to set the world alight. However, monthly activity has remained relatively consistent, with more activity in 2016 compared with 2015. Users seems to find many of the comments valuable and community standards are high. It’s interesting that a lot of discussion is around articles as they are published. This is good, but I think we also need maintenance annotation of older articles to point out issues such as broken URLs.
All it needs now is more active users, more comments per user and a real API.
Filed under: publications, R, statistics Tagged: comments, ncbi, pubmed commons
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.