Site icon R-bloggers

FOSS4Spectroscopy: R vs Python

[This article was first published on Chemometrics and Spectroscopy Using R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

If you aren’t familiar with it, the FOSS for Spectroscopy web site lists Free and Open Source Software for spectroscopic applications. The collection is of course never really complete, and your package suggestions are most welcome (how to contribute). My methods for finding packages are improving and at this point the major repositories have been searched reasonably well.

A few days ago I pushed a major update, and at this point Python packages outnumber R packages more than two to one. The update was made possible because I recently had time to figure out how to search the PyPi.org site automatically.

In a previous post I explained the methods I used to find packages related to spectroscopy. These have been updated considerably and the rest of this post will cover the updated methods.

< section id="repos-topics" class="level2">

Repos & Topics

There are four places I search for packages related to spectroscopy.1

The topics I search are as follows:

< section id="searching-cran" class="level2">

Searching CRAN

I search CRAN using packagefinder; the process is quite straightforward and won’t be covered here. However, it is not an automated process (I should probably work on that).

< section id="searching-github" class="level2">

Searching Github

The broad approach used to search Github is the same as described in the original post. However, the scripts have been refined and updated, and now exist as functions in a new package I created called webu (for “webutilities”, but that name is taken on CRAN). The repo is here. webu is not on CRAN and I don’t currently intend to put it there, but you can install from the repo of course if you wish to try it out.

Searching Github is now carried out by a supervising script called /utilities/run_searches.R (in the FOSS4Spectroscopy repo). The script contains some notes about finicky details, but is pretty simple overall and should be easy enough to follow.

< section id="searching-pypi.org" class="level2">

Searching PyPi.org

Unlike Github, it is not necessary to authenticate to use the PyPi.org API. That makes things simpler than the Github case. The needed functions are in webu and include some deliberate delays so as to not overload their servers. As for Github, searches are supervised by /utilities/run_searches.R.

One thing I observed at PyPi.org is that authors do not always fill out all the fields that PyPi.org can accept, which means some fields are NULL and we have to trap for that possibility. Package information is accessed via a JSON record, for instance the entry for nmrglue can be seen here. This package is pretty typical in that the author_email field is filled out, but the maintainer_email field is not (they are presumably the same). If one considers these JSON files to be analogous to DESCRIPTION in R packages, it looks like there is less oversight on PyPi.org compared to CRAN.

< section id="searching-julia" class="level2">

Searching Julia

Julia packages are readily searched manually at juliapackages.org.

< section id="cleaning-final-vetting" class="level2">

Cleaning & Final Vetting

The raw results from the searches described above still need a lot of inspection and cleaning to be usable. The PyPi.org and Github results are saved in an Excel worksheet with the relevant URLs. These links can be followed to determine the suitability of each package. In the /Utilities folder there are additional scripts to remove entries that are already in the main database (FOSS4Spec.xlsx), as well as to check the names of the packages: Python authors and/or policies seem to lead to cases where different packages can have names differing by case, but also authors are sometimes sloppy when referring to their own packages, sometimes using mypkg and at other times myPkg to refer to the same package.

< section class="footnotes footnotes-end-of-document">

Footnotes

  1. Once in a while users submit their own package to the repo, and I also find interesting packages in my literature reading.↩︎

  2. packagefinder has recently been archived, but hopefully will be back soon.↩︎

< section class="quarto-appendix-contents">

Reuse

https://creativecommons.org/licenses/by/4.0/
< section class="quarto-appendix-contents">

Citation

BibTeX citation:
@online{hanson2022,
  author = {Bryan Hanson},
  title = {FOSS4Spectroscopy: {R} Vs {Python}},
  date = {2022-07-06},
  url = {http://chemospec.org/posts/2022-07-06-F4S-Update/2022-07-06-F4S-Update.html},
  langid = {en}
}
For attribution, please cite this work as:
Bryan Hanson. 2022. “FOSS4Spectroscopy: R Vs Python.” July 6, 2022. http://chemospec.org/posts/2022-07-06-F4S-Update/2022-07-06-F4S-Update.html.
To leave a comment for the author, please follow the link and comment on their blog: Chemometrics and Spectroscopy Using R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.