Site icon R-bloggers

BMC favors source code plagiarism

[This article was first published on YGC » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I found source code plagiarism a year ago and reported this case to BMC Systems Biology:

I listed source code of many functions that are exactly copied from GOSemSim with only function name changed in my email. The detail of source code plagiarism can also be found at Proper use of GOSemSim.

I got reply from the Editor 3 days after (see screenshot).

MS: 1069569029107167
Research
ppiPre: predicting protein-protein interactions by combining heterogeneous features
BMC Systems Biology 2013, 7:S8

Dear Dr Yu,

Thank you for your message and bringing this to our attention. Firstly I would like to assure you that we do take cases of plagiarism very seriously. We are members of the Committee on Publication Ethics (http://publicationethics.org) follow their guidelines for circumstances such as this.

As a first step we will be investigating the overlap between this publication in BMC Systems Biology and your previously published work, to establish the extent of the duplication.

Could I just check, have you compared all the code in the ppiPRE article for similarity with your own code? That is, is there any possibility that more of your code has been duplicated without attribution? We will of course have to examine the code for unattributed use of other authors code in addition to your own

Please do let us know if there are more relevant details you are aware of. We will be in contact again in due course as our investigations proceed, but do not hesitate to contact me in the meantime if you would like an update

Best wishes,
Tim

The editor assured me that they ‘do take cases of plagiarism very seriously‘ and will do the first step to investigate the overlap. I am waiting for their update, but 5 months passed, I never receive any reply from them even I sent several emails to ask for updates. I have no idea how long their ‘due course‘ is.

As I did not receive any reply five months after, I wrote a blog post (see Proper use of GOSemSim) to bring this to the R community and CRAN did remove the ppiPre package at that time. I told the Editor that this package had been removed from CRAN and again asked for update.

This time, he did reply me (see screenshot):

MS: 1069569029107167
Research
ppiPre: predicting protein-protein interactions by combining heterogeneous features
BMC Systems Biology 2013, 7:S8

Dear Guangchuang,

Thank you for your message and my apologies for the delay in this investigation. We had been trying to identify if any other code had also been plagiarised in the publication, although we did not find any evidence of this. Thank you also for drawing our attention to the removal of ppiPre from CRAN. We will be in touch with the article authors and will let you know when there is a further update.

Regards,
Tim

As we can find in the editor’s email, ‘We will be in touch with the article authors‘, they never contact the authors after 5 months passed from my report. Another 2 months passed, I never receive any reply again. I was angry and sent an email to several Editors of BMC to express that they are ruin the reputation of BMC. Then again as expected, I got email (see screenshot):

Dear Guangchuang,

We are continuing to pursue this case. As you will understand, as well as the clear evidence of plagiarism of your code, we are concerned about possible unacknowledged reuse of other code as well. We have been trying to contact the authors for a full explanation, as this will affect how we can address the issue. Until very recently we had not been successful in contacting them, although they have very recently got back to us  and have said that they are considering their response, so we expect an update soon. From our perspective it is highly preferable for the authors to come clean about the issues and for any correction or retraction notice to come from them rather than from the editors, and for this reason it is often a drawn out process.

Thank you for your patience, and I can again assure you that this will be addressed formally in due course.

Yours sincerely,
Tim

I am very happy to see this as the editor told me they have contacted the authors and will have an update soon. I wait another 3 months (10 months after my report) and as usual they never send me email. I asked for update again and this time they did give me an update (see screenshot):

Dear Guangchuang,

Looking back through my emails, I did receive a response back from the authors previously. They in turn said that the email address I had originally been trying to contact them with earlier last year was no longer in use, so they only received my emails when I found an additional alternative email address for the authors in my more recent emails.

Their explanation for the use of your code was that as the similarity measures were not their main focus of their study they had intended to either implement existing methods themselves or as in the case of GoSemSim, import the packages. However they had some problems with this and instead utilised the source code directly. Their rationale was that as the code was GPL licenced that this was acceptable, without realising that it was also required to cite the original source.

They say they have now created an updated version which they say has the following changes:

(1) ppiPre imports GOSemSim.
(2) ppiPre calls function geneSim() exported by GOSemSim to calculate Wang’s measure, instead of deriving code from GOSemSim directly.
(3) Several internal functions of ppiPre (TCSSGetAncestors, GetOntology, GetGOParents, GetLatestCommonAncestor, TCSSCompute_ICA) are derived from GOSemSim since GOSemSim don’t export them. In source code (.R) and manuals (.RD), acknowledgement has been added including the information of author and publication of GOSemSim.
(4) The author of GOSemSim (Guangchuang Yu) has been added in the ‘Author’ filed as contributor, as required by CRAN Repository Policy.
(5) All the information content data in data directory of ppiPre have been removed. ppiPre directly obtained data from GOSemSim.

This version has been available in SourceForge (https://sourceforge.net/p/ppipre/). And we are submitting ppiPre to CRAN. We will notify you after ppiPre is available on CRAN.

They also apologise for failing to cite you correctly and have asked if they may submit a correction to their article to rectify this. This would be within our policy if this sounds acceptable to you. Please let me know if you would need more details or if you need to get in touch with the authors
 
Best wishes,
Tim

After the changes, ppiPre is back to CRAN. Now in CRAN, what we can find is that ppiPre use source code from GOSemSim properly with acknowledgement. They removed all previous source tarball so that their source code plagiarism was gone. Fortunately, github recorded all the changes. The editor delay the processing to give the authors enough time to remove their bad records. I think CRAN shouldn’t accept re-submitted package if it was removed for plagiarism issue.

In the email, the editor not only conveyed explanation from the author, but also have expressed their opinion: ‘They also apologise for failing to cite you correctly and have asked if they may submit a correction to their article to rectify this. This would be within our policy if this sounds acceptable to you‘. This is why I think BMC do favor source code plagiarism. Their judgement/opinion is not fair!

I can’t accept their explanation and replied to the editor (see screenshot):

Dear Prof. Sands,

Thanks for your email. That makes it more clear.

I can't accept their explanation.

The ppiPre is a die package, they didn't update any source code or IC data after publication, with the fact that annotation packages for calculating IC data have updated several times. The recent changes were forced by my report of plagiarism.

The recent update follows the requirement of GPL license which requires the derivative work also GPL licensed and acknowledge original works. That's good, but remember it was forced to happen after my report of source code plagiarism.

In the version they published the paper, it is indeed source codes copied without any acknowledgement and the proportion is very large (about 2/3 of ppiPre was copied from GOSemSim).

In their explanation, the author claimed that similarity measures were not their main focus. Can I ask what's their main focus? SVM for protein-pretein interaction prediction as indicated in the title of the paper and in the package name? The SVM in that package is less than 80 lines, https://github.com/cran/ppiPre/blob/master/R/SVMPredict.r with more than half in calling similarity measures, and several empty lines, paramter checking and preparing input and output. They did nothing in svm; they just call e1071::svm in line 28 and and e1071::predict in line 16 with default parameters (users even don't have chance to change kernel function in their package). If that's their main focus, what's their contribution of their paper? In the Method session, their only use a small paragraph, Prediction framework, that only mention they used svm for prediction but with many paragraphs emphasizing they have implemented many GO semantic similarity measures. They call svm implemented in e1071 package to train features calculated by source codes from GOSemSim. How do they explain this?  I do have no idea of their so call main focus.
 
They also mentioned that they had problems in importing GOSemSim. This is also un-acceptable. As they copied most of the functions without any change, it is easy to import these functions using GOSemSim:::un-exported_function if the function is not exported. If they don't know this feature and did not google search how to do it. They should contact me for help or ask for my permission of re-using my codes. But they didn't. They may believe they can use my source code freely as it is GPL licensed, they should know that open source license also have requirement of acknowledgement. Even that's due to their ignorance, why they only change the function names to pretend that's their own functions? If these are not their main focus, why more than half of the text in the paper explaining the calculation of these algorithms and claim they were implemented in their package?

If they are not intended to do this, they will have acknowledgement in their source code, they will have citation in their paper and they will not just change function name that was copied without other changes. They intended to do so to cheat editor, reviewers and readers.

>> (2) ppiPre calls function geneSim() exported by GOSemSim to calculate Wang’s measure, instead of deriving code from GOSemSim directly.

Besides, most of the source code is not 'derived' but copied from GOSemSim with only function name changed which can still be detected in the updated version, https://github.com/cran/ppiPre/blob/master/R/GOKEGGSims.r.

Best Regards,
Guangchuang

BUT again, the editor just ignore my email! 1 year passed and the case is very clear that they copied my source code and intended to pretend that’s their original work and use it to publish a paper. That’s definitely plagiarism. The editor replied me very quickly at the beginning and assured me that BMC do take cases like this seriously. I don’t know what happened after they contacted the authors. The editor always ignore my email except I put some pressure to them, they delay the process and try to persuade me that this is due to the author’s ignorance of open source license and an apology from the author is enough to solve this issue.

If you see this post, you are welcome to comment and please help share it to social media. This will helps to give BMC some pressure and I will contact COPE if they won’t addressing this.

Related Posts

To leave a comment for the author, please follow the link and comment on their blog: YGC » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.