Convert Your R Function to an S3 Generic: Benefits, Pitfalls & Design Considerations
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
To build a tight and well-integrated data pipeline, it may be desirable to rely on object orientation (OO) to automatically pass valuable information from one step to the other. OO and data classes can also act as a compatibility layer standardising outputs from various tools under a common structure.
But many packages and software start as standalone projects, and don’t always stem from a careful consideration of the larger ecosystem. In this situation, developers often see little benefit of using an OO system in their project initially.
But as the project matures, and as the position of the tool in the wider ecosystem becomes clearer, they may want to start using OO to benefit from the better integration it may provide with other tools upstream and downstream in the data pipeline. However, by then, their tool likely has an established community of users, and it is important to tread carefully with breaking changes.
In this blog post, we show that it’s possible to start using an S3 OO system almost invisibly in your R package, with minimal disruption to your users. We detail some minor changes that will nonetheless occur, and which pitfalls you should be looking out for. Finally, we take a step back and reflect how you should ensure you are a good open-source citizen in this endeavour.
Benefits
Let’s reuse the example function from one of our previous posts:
#' @export centroid <- function(coords, weights) { # ... }
Since we wrote and released this function, someone may have designed a clever data class to store coordinates of a set of points and their weights. Let’s imagine they use the following class that they call pointset
:
example_pointset <- structure( list( coords = list(c(0, 1, 5, 3), c(8, 6, 4, 3), c(10, 2, 3, 7)), weights = c(1, 1, 1, 1) ), class = "pointset" )
They may also have developed nice utilities for this class so there is a clear motivation for you to integrate with their class since it’s less work you’ll have to do. Plus, you immediately become compatible with any package that uses the same class.
We will not spend too much time on the practical steps to operate this conversion since this is already covered in details in the dedicated chapter of Advanced R, by Hadley Wickham, as well as this blog post from Nick Tierney 1. But the final result would be:
#' Compute the centroid of a set of points #' #' @param coords Coordinates of the points as a list of vectors. Each element of #' the list is a point. #' @param weights Vector of weights applied to each of the points #' #' @returns A vector of coordinates of the same length of each element of #' `coords` #' #' @examples #' centroid( #' list(c(0, 1, 5, 3), c(8, 6, 4, 3), c(10, 2, 3, 7)), #' weights = c(1, 1, 1) #' ) #' #' @export centroid <- function(coords, weights) { UseMethod("centroid") } #' @rdname centroid #' #' @export centroid.default <- function(coords, weights) { # ... } #' @rdname centroid #' #' @export centroid.pointset <- function(coords, weights = NULL) { centroid(coords$coords, coords$weights) }
What subtle changes should you be looking out for?
You may already have noticed a couple of minor changes in the example above but some changes are even less evident and easy to forget, hence this blog post.
All methods must have the same arguments as the generic
You can see that the method for pointset
class, centroid.pointset()
has a weights
argument, even though it is not used because weights are already contained in the coords
object. This seems clunky and potentially confusing for users. But this is mandatory because all methods must have the same arguments as the generic.
Another option here could have been to remove weights
from the generic, and add ...
instead, thus allowing to pass weights
as an extra argument only in selected methods. This is more idiomatic in R, and in line with the recommendation from the official ‘Writing R Extensions’ document (“always keep generics simple”):
#' @export centroid <- function(coords, ...) { UseMethod("centroid") } #' @rdname centroid #' #' @export centroid.default <- function(coords, weights, ...) { coords_mat <- do.call(rbind, coords) return(apply(coords_mat, 2, weighted.mean, w = weights)) }
But this extra ...
argument, which is documented as “ignored”, may be confusing as well.
More complex documentation presentation
On the topic of arguments, another pitfall related to the conversion to an S3 generic is the change in the documentation. Below is a collage of before / after the change. This is quite minor and some users may not even notice it but I remember it was very confusing to me when I started using R and I didn’t really know what S3 or OO was: “what do you mean, ‘Default S3 method’, which case applies to me?”
The answer is that “Default S3 method” lists the arguments for centroid.default()
, i.e., the method which is used if no other method is defined for your class. Arguments for all methods are usually documented together but you should only focus on those present in the call after the comment stating “S3 method for class ‘XXX’” for the class you’re working with.
More complicated error traceback
Another situation where converting to an S3 adds an extra layer of complexity is where you are trying to follow the error traceback:
centroid(3)
In this example, we see one extra line that did not exist when centroid()
was a regular function, rather than a generic:
centroid.default(3) at centroid.R#19
This line corresponds to the dispatch operation.
However, this slight difference in behaviour is likely not a big issue as we mostly expect experienced users to interact with the traceback. These users are likely to be familiar with S3 dispatch and understand the traceback in any case.
Extra source of bugs during dispatch
On a related note, the extra step introduced by this conversion to generic is another potential source of bugs. This doesn’t really impact your users directly but it does mean that as a developer, you will maintaining slightly more complex code and you will need to be more careful when making any changes. However, as always, a robust testing suite should help you catch any error before it makes it to production.
Where should the generic & methods live?
In the previous section, we mentioned that you may want to rely on existing, established S3 classes. How does it work in practice when you want to add a method for a class outside of your package? Do you need to import the package where the class is defined? On the other side of the fence, as a class developer, is it okay to provide methods for generics provided in other packages? If you have the choice, should the method live in the package defining the generic or the class?
Where should the generic live?
The generic should always live in the package implementing the actual computation in the function in the first place. For example, if you defined the original centroid()
function in a package called geometryops, the S3 generic should also be defined in that package, not in the package defining the pointset
class.
It is possible in theory to overwrite a function defined by another package with a generic (“overloading”). For example, we could overload base R table()
function with:
table <- function(...) { UseMethod(...) } table.default <- function( ..., exclude = if (useNA == "no") c(NA, NaN), useNA = c("no", "ifany", "always"), dnn = list.names(...), deparse.level = 1 ) { base::table( ..., exclude = exclude, useNA = useNA, dnn = dnn ) }
But this is generally considered bad practice, and possibly rude 2. As a rule of thumb, you should usually avoid:
- name collisions with functions from other packages (especially base or recommended package);
- light wrappers around a function from another package as this may be seen as an attempt to steal citations and credit.
Where should the methods live?
For methods, there is more flexibility than for generics. They could either in the package defining the class, or in the package defining the generic. Let’s present the practical setup in both cases, as well as each strategy pros & cons.
Method in the class package
This is the strategy used when you defined a new class and provide it with a print()
, a summary()
, or a plot()
method. The generics for these functions are defined in R base.
#' @export plot.myclass <- function(x, y, ...) { # code for a beautiful plot for your custom class }
If you opt for this strategy, you will need to depend on the package providing the method, as Imports
. For example, a package defining a fit.myclass()
method for the fit()
generic defined in the generics package would have the following DESCRIPTION
and NAMESPACE
.
DESCRIPTION
Imports: generics
fit.myclass.R
#' @export #' @importFrom generics fit fit.myclass <- function(x, ...) { # your code here }
NAMESPACE
# Generated by roxygen2: do not edit by hand S3method(fit,myclass) importFrom(generics,fit)
But this can lead to a rapid increase in the number of dependencies if you provide methods for generics from various packages. Since R 3.6, you can also put generics in Suggests
and use delayed assignment:
DESCRIPTION
Suggests: generics
fit.myclass.R
#' @exportS3Method generics::fit fit.myclass <- function(x, ...) { # your code here }
NAMESPACE
# Generated by roxygen2: do not edit by hand S3method(generics::fit,myclass)
Method in the generic package
Alternatively, you can define the method in the package defining the generic. This is the approach taken in the report package from example, which defines the report()
generic and methods for various model outputs produced by different package.
In theory, no Imports
or Suggests
is required here:
#' @export mygeneric <- function(x, ...) { UseMethod(x) } #' @export mygeneric.externalclass <- function(x, ...) { # your code here }
However, if you end up providing many methods for a specific class, you could put the package defining it in the uncommon Enhances
field. Enhances
is defined in ‘Writing R Extensions’ as:
The ‘Enhances’ field lists packages “enhanced” by the package at hand, e.g., by providing methods for classes from these packages.
It may be a good idea to explicitly signal the strong relationship between both packages so that the package defining the method is checked as a reverse dependency, and informed of potential breaking changes as discussed below. You may see an example of this in the slam package, which provides his methods for both base matrices and sparse matrices, as defined in the Matrix and the spam packages.
Coordination between maintainers
No matter the strategy you end up choosing, we strongly recommend you keep an open communication channel between the class package and the generic package developer (provided they are not the same person) as breaking changes will impact both parties.
Conclusion
As we’ve seen here, there are clear benefits to converting your standard function to an S3 generic. This can be done almost transparently but we’ve highlighting some subtle changes you may want to consider before pulling the switch.
Footnotes
Note that we focus here on the S3 framework but R has other object orientation frameworks, as discussed in the relevant section of the ‘Advanced R’ book by Hadley Wickham↩︎
Every rule has its exceptions though such as the generics package, built by prominent members of the R developer community, which overloads base R functions such as
as.factor()
oras.difftime()
.↩︎
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.