[This article was first published on chem-bla-ics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Machine learning is a field of science that focusses on mathematically describing patterns in data. Chemometrics does this for chemical data. Examples are (nano)QSAR where structural information is related to biological activity. I studied during my PhD studies the interaction between the statistics and machine learning with how you computationally (numerically) represent the question. The right combination is not obvious and it has become common to try various modelling methods, though something with support vector machines (SVM/SVR) and more recently neural networks (deep learning) have become popular. A simpler model, however, has its benefits too and frequently not significantly worse than more complex models. That said, exploring all machine learning methods manually takes a lot of time, as each comes with its own parameters which need varying.
Georgia Tsiliki (NTUA partner in eNanoMapper), Cristian Munteany (former postdoc in our group), and others developed RRegrs, an R package to explore the various models and automatically calculate a number of statistics to allow to compare them (doi:10.1186/s13321-015-0094-2). That said, following my thesis, you must never rely on performance statistics, but the output of RRegrs may help you explore the full set of models.
Tsiliki, G., Munteanu, C. R., Seoane, J. A., Fernandez-Lozano, C., Sarimveis, H., Willighagen, E. L., Sep. 2015. RRegrs: an r package for computer-aided model selection with multiple regression models. Journal of Cheminformatics 7 (1), 46.http://dx.doi.org/10.1186/s13321-015-0094-2
Machine learning is a field of science that focusses on mathematically describing patterns in data. Chemometrics does this for chemical data. Examples are (nano)QSAR where structural information is related to biological activity. I studied during my Ph...
I had to dig deep to find posts on QSAR modeling. There are quite a few on QSAR in Bioclipse, but that focuses on the descriptor calculation. In a quick scan, I could only spot two modeling posts:The CDK/Metabolomics/Chemometrics Unconference resultsWh...
I had some time to work some more on the QSAR functionality in Bioclipse. There is still much to do, but it is getting there. The calculation of a QSAR descriptor data matrixThis screenshot shows that multi-resource selection is now working, and that t...
July 26, 2007
Similar post
To leave a comment for the author, please follow the link and comment on their blog: chem-bla-ics.