Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Here’s another rabbit hole where I spent a bit of time this evening. I like OOP and I like the way R uses vectors. I’ve created a few classes and had started to code a function which would plot a set of them. It all seemed straightforward until I realized that the infrastructure for treating your own classes is something that you have to build. Moreover, if you have a slot with an object like a data frame, this could get sticky.
Here’s a toy example. Let’s say I’d like to keep track of movies that I borrow from friends. I can do that by establishing a data frame to record the films and I’ve got a slot to indicate from whom I borrowed them.
movie = c("Thunderball", "Goldfinger") rating = c(4,5) dfJoe = data.frame(movie = movie, rating = rating) movie = c("Manhattan", "Interiors", "Radio Days", "Bananas") rating = c(5, 4, 3, 5) dfBob = data.frame(movie = movie, rating = rating) setClass("BorrowedStuff", representation(stuff = "data.frame", from="character")) JoesStuff = new("BorrowedStuff", from = "Joe", stuff = dfJoe) BobsStuff = new("BorrowedStuff", from = "Bob", stuff = dfBob)
I now have two objects which store a set of relevant information (relevant to Joe and Bob, at any rate). But surely I’d like to keep this all in one place. We do that with vectors. When you pass a vector into a function- even an S4 new class method- it will pass the arguments in as vectors. The function will attempt to evaluate the output using vector operations and- probably- return a vector. Here’s a silly example.
sillyFunction = function(x){ x + 1 } sillyFunction(1) sillyFunction(1:10)
Note the second call to the function will return a vector. So, if we want a vector of S4 objects, we just pass in vectors, right? That would work for primitive datatypes, but if we try it with something like a data frame, it will fail.
whatStuff = new("BorrowedStuff", from = c("Joe", "Bob"), stuff = c(dfJoe, dfBob)) whatStuff = new("BorrowedStuff", from = c("Joe", "Bob"), stuff = list(dfJoe, dfBob))
Neither of those will work because the new BorrowedStuff method is expecting a data frame. The only way to create a “vector” of data frames is by placing them in a list and when the constructor sees that, it will complain.
The answer is to create a “c” function which will concatenate two objects.
setMethod("c", signature(x = "BorrowedStuff"), function(x, ...){ elements = list(x, ...) stuffList = list() for (i in 1:length(elements)){ stuffList[i] = new("BorrowedStuff", from = slot(elements[[i]], "from"), stuff = slot(elements[[i]], "stuff")) } class(stuffList) = "BorrowedStuff" stuffList }) whatStuff = c(JoesStuff, BobsStuff) whatStuff[[1]]@stuff someStuff = whatStuff[[1]]
I’ll point out that I based this code on something similar in the source for the lubridate package. In all of the material I’ve seen about S4 so far, there’s little mention of some of the nuts and bolts about building operators like c, [, + and so forth in a way that’s appropriate for your class. Oh well. Reading Hadley’s code is a fantastic way to see how experts do R right.
Setting the class to “BorrowedStuff” may not be a wise idea. I do that because it means that the object will be shown as “BorrowedStuff” rather than “list” in R Studio’s workspace pane. That’s a dreadful reason for doing something like that, but it looks cool.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.