Site icon R-bloggers

Practicing static typing in R: Prime directive on trusting our functions with object oriented programming

[This article was first published on Memo's Island, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The creator of S language which R is derived from John Chambers said in one of his books  Software for data analysis programming with R
…This places an obligation on all creators of software to program in such a
way that the computations can be understood and trusted. This obligation
I label the Prime Directive.
He was referring to prime directive from Star Trek. One of the practice in this direction is to have a proper checks in place for the types we use. We can trust that if we pass for example a wrong type to our function, it will fail gracefully. So a type system of a programming language is quite important in mission critical numerical computations. Since R language is weakly typed language, or dynamically typed similar to Perl, Python or Matlab/Octave, most of R users omit to place type checks in their functions if not rarely. For example take the following function that takes arguments of a matrix, a vector and a function name. It applies the named function to each columns of the matrix listed in the given vector. Assuming named function is returning a single number our function will return a vector of numbers.
< !-- HTML generated using hilite.me -->
myMatrixOperation  <-  function(A, v, fName) {
  sliceA <-  A[, v];
  apply(sliceA, 2, fName);
}

One obvious way to put if statements for each argument in our function. So, function may look like:
< !-- HTML generated using hilite.me -->
myMatrixOperation <- function(A, v, fName) {
  if(!is.matrix(A)) {
   stop("A is not a matrix");
  }
  if(!is.vector(v)) {
   stop("v is not a vector");
  }
  if(!is.funcion(fName)) {
   stop("fName is not a function");
  }
  sliceA <- A[, v];
  apply(sliceA, 2, fName);
}
The problem with this approach appears to be the fact that it is too verbose and if we have a repeating pattern of arguments in many functions and many arguments, we would copy and paste code many times. It would not only look ugly but wastes our time. Luckily there is a mechanism to address this: S4-class system. Let’s define an S4 class for our set of arguments, following an example instantiation.
< !-- HTML generated using hilite.me -->
setClass("mySlice", representation(A="matrix", v="vector", fName="function"))

myS <- new("mySlice",, A=matrix(rnorm(9),3,3),v=c(1,2), fName=mean)
str(myS)
Formal class 'mySlice' [package ".GlobalEnv"] with 3 slots
  ..@ A    : num [1:3, 1:3] 0.356 -0.34 -0.642 -0.466 2.915 ...
  ..@ v    : num [1:2] 1 2
  ..@ fName:function (x, ...)

Now if we re-write the function that uses our S4 class with type checking only to passing object once.
is.mySlice <- function(obj) {
  l <- FALSE 
  if(class(obj)[1] == "mySlice") { l <- TRUE } 
  l 
} 

myMatrixOperation <- function(mySliceObject) { 
  if(!is.mySlice(mySliceObject)) { 
    stop("argument is not class of mySlice") 
  }  
  sliceA <- mySliceObject@A[, mySliceObject@v]; 
  apply(sliceA, 2, mySliceObject@fName); 
} 
This simple example demonstrates how we can introduce a good organization to our R codes, that obeys the prime directive. Further more modern approach to object orientation is introduced by John Chambers called Reference classes. If you practice this kind of approach in your R codes than I can only say; Live long and prosper.

To leave a comment for the author, please follow the link and comment on their blog: Memo's Island.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.