Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
It has been 6 months since the launch of Diffify, our website for comparing package releases. We are delighted to announce that, in addition to CRAN’s 20,000 R packages, you can now track 1600 popular Python packages!
What’s included?
The current criteria for a Python package to be included in Diffify are:
- The package is listed in the top 2000 PyPI packages according to download statistics.
- The package has had version releases since 1st May 2020.
- The package wheel is downloadable from pypi.org.
If your favourite package is not currently accessible, don’t worry! We are actively working to expand the list to as many PyPI packages as possible, as we’ll explain below.
Data comes in all shapes and sizes. It can often be difficult to know where to start. Whatever your problem, Jumping Rivers can help.
< !-- This is where the ad goes! Just use the name of the shortcode file. -->
New content
The first change you’ll notice is to our homepage, where we now have buttons for both R and Python.
Clicking on the Python button will take you through to the package search bar. For this walkthrough, we will compare versions 3.3.0 and 3.5.0 of the Matplotlib package. Diffify provides a breakdown of the changes to the package dependencies, functions and classes.
Dependencies
We consider three kinds of dependencies:
- The Python version requirement.
- Required Python packages – these must be installed.
- Optional Python packages – installing these will enable extra package features.
In our example, we see that the Python version requirement has changed from
>=3.6
to >=3.7
.
Functions
Here we provide a list of functions that have been added, removed or changed between the two versions.
Clicking on the “Details” dropdown will bring up the function arguments, including the argument name and default value. If type annotations are included in the package source code, Diffify will also display the argument type and the function return type.
For the pyplot.grid()
function, the name of the first positional argument has
changed from b
to visible
.
Classes
Here we provide a list of classes that have been added, removed or changed.
Clicking on the “Methods” button for a class will bring up a pop-up that lists
the methods that belong to that class. The example below shows the methods
.__init__()
and .from_dict()
, which belong to the spines.Spines
class.
Similar to functions, you can access the method arguments by clicking on “Details”.
Removing clutter
The functions and classes listed above have been detected by analysing the package source code. We have taken various steps to filter out code that is intended for internal use by the package developers, including
- ignoring functions and scripts whose names start with a leading underscore
- ignoring functions whose names start
test*
and classes whose names startTest*
- leaving out scripts whose names start
test_*
or end*_test.py
These criteria are intended to leave out internal code and unit tests.
Looking ahead
Python has been around for quite a while, and consequently it has many packages – 400,000 to be precise! Perhaps unsurprisingly, analysing so many packages for Diffify has proven to be a bit of a challenge…
This is why we have initially chosen to focus on the 2000 most popular PyPI packages. We will soon extend this to the top 5000, according to Top PyPI Packages. And we won’t be stopping there! It remains to be seen whether we will manage to add all 400,000, but we will certainly try our utmost.
Despite our best efforts to filter out clutter, you may still come across some functions and classes that are clearly intended for internal use or unit testing. We will continue to look at ways to improve our filters.
We hope you enjoy the new content! As always, if you spot any bugs or have any suggestions please add an issue to our public GitHub.
Stay tuned for more updates…
For updates and revisions to this article, see the original post
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.