Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This blog post aims to give a brief introduction to R7, a new R package for OOP in R. It’s not a tutorial on how to write code using R7 – the documentation provides great instructions for getting started if you’re already ready to start programming in R7.
Note There is an ongoing discussion about the name of R7, and it will likely change at some point in the future.
What is OOP?
Before we talk about R7, we should probably talk about OOP. OOP (short for Object Oriented Programming) is a programming framework that focuses on objects and their interactions, rather than on the evaluation of functions (as in a functional programming framework). If you’re an R user, you’ve almost certainly used OOP approaches even if you haven’t realised it yet. For example, if you call print()
on a vector the output it returns is very different to the output it returns if you call print()
on a plot.
OOP in R
In typical object-oriented systems, each object is of a particular class (type) and has data and methods (object-specific functions) associated with it. The behaviour observed when a method is called depends on the class of the object that the method is associated with. There are multiple OOP systems that already exist in R, including:
-
S3: the simplest and most commonly used object-oriented system, where the
class
attribute defines the type of an object. S3 is widely used throughout the base R so it’s important to know about it if you want to extend functions to work with different inputs. It’s name comes from version 3 of the S language! -
S4: similar to S3 but includes more formal class definitions and validation. In S4, the data contained in an object is defined by the slots in the class definition. S4 is a bit more complicated than S3 but results in better guarantees. S4 isn’t quite as widely used as S3, though the Bioconductor community is a long-term user of S4, so it’s important to know if you want to contribute to Bioconductor packages. The {lme4} package and some spatial packages (including {sp}, {rgdal}, and {rgeos}), also make use of S4 classes.
-
Reference Classes (RC): a special type of S4 that also allows objects to be modified in place. Reference classes have very low adoption within the R community, and are not widely used.
-
R6: similar to RC but simpler to use, and which uses S3 instead of S4. Unlike the previous OOP systems mentioned, R6 is a package rather than part of base R. It’s primarily been developed by Posit (formerly RStudio) and is used within the Shiny package.
Whether you want to start from scratch, or improve your skills, Jumping Rivers has a training course for you.
This blog post doesn’t aim to go into the details about object-oriented systems in R, and I’d recommend reading the Object Oriented chapters of Advanced R for more details.
So if we already have all these OOP systems in R, why do we need another one? You can watch Hadley Wickham’s talk from rstudio::conf(2022) for some more background information on the motivation for developing R7.
Image: xkcd.com/927
What is R7?
The two main OOP systems in R, S3 and S4, both have their advantages and their limitations. For example, in S3 there’s no systematic object validation to make sure an object’s class is correct. In S4, the syntax for defining classes is rather unusual and relies on side effects. Issues such as these mean that, unlike other programming languages, there isn’t a dominant approach to OOP in R.
Now imagine you could take the best bits of S3 and the best bits of S4. That’s where R7 comes in. The R7 package is a new OOP system designed to be a successor to S3 and S4. Unlike S3 and S4 (which were developed for S), R7 is specifically developed for R. Hence the name: 3 + 4 = 7. R7 is currently being developed by The R Consortium Working Group on OOP. The long-term goal is to merge R7 into base R.
You can install the development version of R7 from GitHub:
remotes::install_github("rconsortium/OOP-WG") library("R7")
Defining a class in R7
R7 classes are defined formally, and the definition includes a list of properties and a (optional) validator. You can use the (intuitively named) new_class()
function to define a new R7 class. For example, if we want to define a simple R7 class with two properties about breakfast cereals (their name
and year_of_launch
) we can use the following code:
cereal = new_class(name = "cereal", properties = list( name = class_character, year_of_launch = class_numeric ) )
It’s not a coincidence that we’ve assigned the new class to an object with the same name as the class. It’s how we construct new instances of the cereal
class. For example, to construct an instance of the cereal
class, you call cereal()
, and pass in the values of the properties as arguments:
coco_pops = cereal(name = "Coco Pops", year_of_launch = 1957)
After you’ve created an R7 object, you can use @
to access and set properties. For example, Coco Pops were actually released in 1958, so you could update and correct the value using:
coco_pops@year_of_launch = 1958
Alternatively, using prop(coco_pops, "year_of_launch") = 1958
does the same thing.
One of the things I really like about R7 is that the type of the property is automatically validated. When I defined cereal()
earlier, I specified that name
must be a character. If I was to pass in a numeric value when creating a new instance, it would return an error. You can also include a validator
argument to new_class()
to provide more complex checks on inputs.
It will also return an error if you try to assign a value to a property that hasn’t been defined. For example, coco_pops@manufacturer <- "Kellogs"
returns an error because manufacturer
isn’t in the list of properties defined in cereal()
.
If you want a property to be dynamic i.e., if you want to compute the property when it’s accessed then the new_property()
function is worth exploring. For example, if you wanted to return the current system time every time you called coco_pops@time
, you could use new_property()
in the class definition:
cereal = new_class(name = "cereal", properties = list( name = class_character, year_of_launch = class_numeric, time = new_property(getter = function(self) Sys.time()) ) )
To me, this already feels a lot more intuitive compared to some of the other OOP systems in R. For more information on dynamic properties, validation, generics, and methods, read the vignette on R7 basics by calling vignette("R7")
, or viewing the documentation on the package website.
What’s Different in R7?
Since R7 is designed to be the successor to S3 and S4, you might be wondering two things: (i) how is R7 different to S3?, and (ii) how is R7 different to S4?
R7 vs S3
The good news is that, since R7 is built on top of S3, R7 objects are S3 objects. However, there are a couple of differences between the two:
-
S3 objects have a
class
attribute. R7 objects also have anR7_class
attribute that contains the object that defines the class. -
S3 objects have attributes. R7 objects have properties (which are built on top of attributes). This means that you can still access properties using the
attr()
function. However, when working in R7 you generally shouldn’t use attributes directly – it just means that your old code will still work.
This means that most R7 will just work with S3. You can create R7 methods for R7 classes and S3 generics, and vice versa. You can also use R7 classes to extend S3 classes, and vice versa.
R7 vs S4
The aforementioned properties that R7 objects have are essentially equivalent to the slots that S4 objects have. The main difference between the two is that, in R7 objects, properties can be dynamic. As with S3, you can combine R7 methods with S4 generics, and vice versa. S4 classes can extend S3 classes (which extends to cover R7 classes). However, R7 classes cannot be used to extend S4 classes.
Should I switch to R7?
If you’re already using S3, switching to R7 should be fairly seamless. You can keep doing everything you’re already doing, plus you get some extra functionality for free.
As I mentioned above, R7 classes cannot be used to extend S4 classes so if you’re an existing user of S4 and have an large codebase built primarily in S4 that you wish to continue to extend – switching to R7 might take a little bit more work. However, if you’re unlikely to want to extend existing S4 classes, the change to R7 should also be relatively smooth. R7 also aims to fix some of the problems with the {methods} package which implements S4, including performance and complexity issues, which is perhaps another reason to give it a go.
If you’re at the point where you think you might need a bit more control than you can achieve with S3, I’d recommend trying R7 before S4. At least from my experience, R7 felt more intuitive and easier to learn than S4.
Note that since R6 is built on encapsulated objects, rather than generic functions like S3 and S4, it’s a very different type of Object Oriented system from R7. So if you’re primarily an R6 (or Reference Classes) user, R7 isn’t going to be a replacement for your existing approaches.
We’re excited to see the developments in R7 over the next few months, and we’ll soon be updating the material in our Object Oriented Programming in R training course to cover R7!
For updates and revisions to this article, see the original post
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.