Convert Your R Function to an S3 Generic: Benefits, Pitfalls & Design Considerations

[This article was first published on Epiverse-TRACE developer space, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

To build a tight and well-integrated data pipeline, it may be desirable to rely on object orientation (OO) to automatically pass valuable information from one step to the other. OO and data classes can also act as a compatibility layer standardising outputs from various tools under a common structure.

But many packages and software start as standalone projects, and don’t always stem from a careful consideration of the larger ecosystem. In this situation, developers often see little benefit of using an OO system in their project initially.

But as the project matures, and as the position of the tool in the wider ecosystem becomes clearer, they may want to start using OO to benefit from the better integration it may provide with other tools upstream and downstream in the data pipeline. However, by then, their tool likely has an established community of users, and it is important to tread carefully with breaking changes.

In this blog post, we show that it’s possible to start using an S3 OO system almost invisibly in your R package, with minimal disruption to your users. We detail some minor changes that will nonetheless occur, and which pitfalls you should be looking out for. Finally, we take a step back and reflect how you should ensure you are a good open-source citizen in this endeavour.

Benefits

Let’s reuse the example function from one of our previous posts:

#' @export
centroid <- function(coords, weights) {

  # ...

}

Since we wrote and released this function, someone may have designed a clever data class to store coordinates of a set of points and their weights. Let’s imagine they use the following class that they call pointset:

example_pointset <- structure(
  list(
    coords = list(c(0, 1, 5, 3), c(8, 6, 4, 3), c(10, 2, 3, 7)),
    weights = c(1, 1, 1, 1)
  ),
  class = "pointset"
)

They may also have developed nice utilities for this class so there is a clear motivation for you to integrate with their class since it’s less work you’ll have to do. Plus, you immediately become compatible with any package that uses the same class.

We will not spend too much time on the practical steps to operate this conversion since this is already covered in details in the dedicated chapter of Advanced R, by Hadley Wickham, as well as this blog post from Nick Tierney 1. But the final result would be:

#' Compute the centroid of a set of points
#'
#' @param coords Coordinates of the points as a list of vectors. Each element of
#'   the list is a point.
#' @param weights Vector of weights applied to each of the points
#'
#' @returns A vector of coordinates of the same length of each element of 
#'   `coords`
#'   
#' @examples
#' centroid(
#'   list(c(0, 1, 5, 3), c(8, 6, 4, 3), c(10, 2, 3, 7)),
#'   weights = c(1, 1, 1)
#' )
#' 
#' @export
centroid <- function(coords, weights) {

  UseMethod("centroid") 

}

#' @rdname centroid
#' 
#' @export
centroid.default <- function(coords, weights) {

  # ...

}

#' @rdname centroid
#' 
#' @export
centroid.pointset <- function(coords, weights = NULL) {

  centroid(coords$coords, coords$weights)

}

What subtle changes should you be looking out for?

You may already have noticed a couple of minor changes in the example above but some changes are even less evident and easy to forget, hence this blog post.

All methods must have the same arguments as the generic

You can see that the method for pointset class, centroid.pointset() has a weights argument, even though it is not used because weights are already contained in the coords object. This seems clunky and potentially confusing for users. But this is mandatory because all methods must have the same arguments as the generic.

Another option here could have been to remove weights from the generic, and add ... instead, thus allowing to pass weights as an extra argument only in selected methods. This is more idiomatic in R, and in line with the recommendation from the official ‘Writing R Extensions’ document (“always keep generics simple”):

#' @export
centroid <- function(coords, ...) { 
  UseMethod("centroid") 
}

#' @rdname centroid
#' 
#' @export
centroid.default <- function(coords, weights, ...) {

  coords_mat <- do.call(rbind, coords)
  
  return(apply(coords_mat, 2, weighted.mean, w = weights))
  
}

But this extra ... argument, which is documented as “ignored”, may be confusing as well.

More complex documentation presentation

On the topic of arguments, another pitfall related to the conversion to an S3 generic is the change in the documentation. Below is a collage of before / after the change. This is quite minor and some users may not even notice it but I remember it was very confusing to me when I started using R and I didn’t really know what S3 or OO was: “what do you mean, ‘Default S3 method’, which case applies to me?”

Screenshot of the centroid() documentation before conversion to an S3 generic

Screenshot of the centroid() documentation after conversion to an S3 generic

The answer is that “Default S3 method” lists the arguments for centroid.default(), i.e., the method which is used if no other method is defined for your class. Arguments for all methods are usually documented together but you should only focus on those present in the call after the comment stating “S3 method for class ‘XXX’” for the class you’re working with.

More complicated error traceback

Another situation where converting to an S3 adds an extra layer of complexity is where you are trying to follow the error traceback:

centroid(3)

In this example, we see one extra line that did not exist when centroid() was a regular function, rather than a generic:

centroid.default(3) at centroid.R#19

This line corresponds to the dispatch operation.

However, this slight difference in behaviour is likely not a big issue as we mostly expect experienced users to interact with the traceback. These users are likely to be familiar with S3 dispatch and understand the traceback in any case.

Extra source of bugs during dispatch

On a related note, the extra step introduced by this conversion to generic is another potential source of bugs. This doesn’t really impact your users directly but it does mean that as a developer, you will maintaining slightly more complex code and you will need to be more careful when making any changes. However, as always, a robust testing suite should help you catch any error before it makes it to production.

Where should the generic & methods live?

In the previous section, we mentioned that you may want to rely on existing, established S3 classes. How does it work in practice when you want to add a method for a class outside of your package? Do you need to import the package where the class is defined? On the other side of the fence, as a class developer, is it okay to provide methods for generics provided in other packages? If you have the choice, should the method live in the package defining the generic or the class?

Where should the generic live?

The generic should always live in the package implementing the actual computation in the function in the first place. For example, if you defined the original centroid() function in a package called geometryops, the S3 generic should also be defined in that package, not in the package defining the pointset class.

It is possible in theory to overwrite a function defined by another package with a generic (“overloading”). For example, we could overload base R table() function with:

table <- function(...) { 
  UseMethod(...)
}

table.default <- function(
  ...,
  exclude = if (useNA == "no") c(NA, NaN),
  useNA = c("no", "ifany", "always"),
  dnn = list.names(...), deparse.level = 1
) {

 base::table(
  ...,
  exclude = exclude,
  useNA = useNA,
  dnn = dnn
 )

}

But this is generally considered bad practice, and possibly rude 2. As a rule of thumb, you should usually avoid:

  • name collisions with functions from other packages (especially base or recommended package);
  • light wrappers around a function from another package as this may be seen as an attempt to steal citations and credit.

Where should the methods live?

For methods, there is more flexibility than for generics. They could either in the package defining the class, or in the package defining the generic. Let’s present the practical setup in both cases, as well as each strategy pros & cons.

Method in the class package

This is the strategy used when you defined a new class and provide it with a print(), a summary(), or a plot() method. The generics for these functions are defined in R base.

#' @export
plot.myclass <- function(x, y, ...) {
  
  # code for a beautiful plot for your custom class
  
}

If you opt for this strategy, you will need to depend on the package providing the method, as Imports. For example, a package defining a fit.myclass() method for the fit() generic defined in the generics package would have the following DESCRIPTION and NAMESPACE.

DESCRIPTION
Imports:
  generics
fit.myclass.R
#' @export
#' @importFrom generics fit
fit.myclass <- function(x, ...) {
  # your code here
}
NAMESPACE
# Generated by roxygen2: do not edit by hand

S3method(fit,myclass)
importFrom(generics,fit)
Importing the generic

It’s worth insisting that you need to import the generic in your NAMESPACE for the method to be recognized and exported correctly by roxygen2. In this specific situation, simply explicitly prefixing the generic call (generic::fit()) is not enough.

But this can lead to a rapid increase in the number of dependencies if you provide methods for generics from various packages. Since R 3.6, you can also put generics in Suggests and use delayed assignment:

DESCRIPTION
Suggests:
  generics
fit.myclass.R
#' @exportS3Method generics::fit
fit.myclass <- function(x, ...) {
  # your code here
}
NAMESPACE
# Generated by roxygen2: do not edit by hand

S3method(generics::fit,myclass)

Method in the generic package

Alternatively, you can define the method in the package defining the generic. This is the approach taken in the report package from example, which defines the report() generic and methods for various model outputs produced by different package.

In theory, no Imports or Suggests is required here:

#' @export
mygeneric <- function(x, ...) { 
  UseMethod(x)
}

#' @export
mygeneric.externalclass <- function(x, ...) {
  # your code here
}

However, if you end up providing many methods for a specific class, you could put the package defining it in the uncommon Enhances field. Enhances is defined in ‘Writing R Extensions’ as:

The ‘Enhances’ field lists packages “enhanced” by the package at hand, e.g., by providing methods for classes from these packages.

It may be a good idea to explicitly signal the strong relationship between both packages so that the package defining the method is checked as a reverse dependency, and informed of potential breaking changes as discussed below. You may see an example of this in the slam package, which provides his methods for both base matrices and sparse matrices, as defined in the Matrix and the spam packages.

Coordination between maintainers

No matter the strategy you end up choosing, we strongly recommend you keep an open communication channel between the class package and the generic package developer (provided they are not the same person) as breaking changes will impact both parties.

Conclusion

As we’ve seen here, there are clear benefits to converting your standard function to an S3 generic. This can be done almost transparently but we’ve highlighting some subtle changes you may want to consider before pulling the switch.

Spreading the S3 love

If you like S3 and find it helpful to convert your function to an S3 class, you should keep propagating the S3 love by also adding an S3 class to your function output.

With this in mind, in the very first example where we converted our centroid() function to an S3 generic to handle pointset objects, we could also make our output a pointset object.

Footnotes

  1. Note that we focus here on the S3 framework but R has other object orientation frameworks, as discussed in the relevant section of the ‘Advanced R’ book by Hadley Wickham↩︎

  2. Every rule has its exceptions though such as the generics package, built by prominent members of the R developer community, which overloads base R functions such as as.factor() or as.difftime().↩︎

To leave a comment for the author, please follow the link and comment on their blog: Epiverse-TRACE developer space.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)