Site icon R-bloggers

Evolving R Tools and Practices

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One of the distinctive features of the R platform is how explicit and user controllable everything is. This allows the style of use of R to evolve fairly rapidly. I will discuss this and end with some new notations, methods, and tools I am nominating for inclusion into your view of the evolving “current best practice style” of working with R.

Background

Let’s place R (or the S programming language) into context.

Strict Languages

Often computer programming language semantics are effectively described by use of analogy that separates the user-observable behavior from the implementation.

For example it would make sense to say in C++ the decision as to which implementation is used during a method call is implemented as if a search were made at runtime across the C++ object type hierarchy until a match is found. Whereas in practice the C++ compiler implements this dynamic dispatch as a reference to a hidden data structure (that is not visible to the programmer) called a vtable. This leads me to say that languages like C++ and Java implement strong object oriented programming as these languages work hard to enforce meaningful invariants and hide implementation details from the user.

Tolerant Languages

In the Python programming language we also see object oriented semantics, but the implementation details are somewhat user visible because the programmer has direct access to the implementation of the object oriented effects (such as: self, __dict__, __doc__, __name__, __module__, __bases__). The object oriented semantics of Python are defined in terms of lookups against these structures, which are user visible (and alterable). So in some sense we can say Python‘s object semantics somewhat rely on convention (the convention being the users don’t mess with the “__*__” structures too much).

Wild Languages

Then we get to the case of R where everything is user visible. In R almost nothing is implemented “as if” a given lookup is performed; the described lookup is almost always explicit, user visible, and alterable. For example R‘s common object oriented system S3 is visibly implemented as pasting method names together with class names (such as the method summary being specialized to models of class lm by declaring a function named “summary.lm“). And to invoke dynamic dispatch there must be an explicit base function itself calling “UseMethod()” to re-route the method call.

Further, under R‘s “everything is a function” rubric, things you would think are language constructs controlled by the interpreter are actually user visible (and modifiable) functions and operators. For an example see the “evil rebind parenthesis” example found here.

R‘s user visible semantics are wholly convention, as they stand only so long as nothing has been tinkered with yet.

So Why Does R Work?

Language extensions that would require cooperation of the core development team in most languages can be implemented through user definable functions and packages in R. This means users can re-define and extend the R language pretty much at will. Given this extreme malleability of the R runtime it is a legitimate question: “why R hasn’t fractured into a million incompatible domain specific languages and died?”

I think R‘s survival and success stems from four things:

  1. Most R users are have the same goal: analyzing data. So they are mostly working in the same domain.
  2. The open nature of the R ecosystem allows competitive evolution of notations and language extensions. We retain the winning ideas and paradigms, regardless of their original source.
  3. R is probably a lot less constant than we choose to perceive it to be. Package maintainers work hard so “things just work” and continue to do so over time.
  4. The amazing efforts of open-source non-profits such as: CRAN, The R Foundation, and The R Consortium.

Some relevant examples that help illustrate how the R ecosystem works include:

What Am I Advocating?

The uses of R‘s plasticity that my group (Win-Vector LLC) distributes, educates on, and advocates include the following:

Conclusion

I think these techniques will make your work as an analyst or data scientist much easier. If this is the case I hope you will help teach and promote these methods.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.