Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
My title here is in the sense of “Bad dog, bad dog!”, a scolding I sometimes see dog owners use to tame their pets, and is also an allusion to Bad Reporter, a sometimes hilarious and always irreverent political comic strip in the San Francisco Chronicle. And my title is intended to convey the point that I think that “good programming practice” rules are sometimes taken overly seriously.
I’ll comment on two aspects here:
- Variable names, spacing etc.
- “Side effects.”
The second is the more unsettling of the two, but the first is interesting because Hadley Wickham is involved, which always gets people’s attention.
What prompted this is that there is currently a discussion in the ASA Statistical Learning and Data Mining Section e-mail list on Hadley’s coding style guidelines. Some of you may know that Google has also had its own R style guidelines for some years now.
Much as I admire Hadley and Google, I really think this is much ado about nothing. I do have my own personal style (if you are curious, just look at my code), but seriously, folks, I don’t think it really matters. As long as a programmer is consistent and has given careful thought as to how to be clear — not only clear to others who read the code, but also clear to the programmer herself when she reads the code cold two months from now — it’s fine.
I would like to see better style in programmers’ English, though. Please STOP using the terms “curly braces” and “square brackets”! Braces ARE curly, by definition, and brackets ARE square, right? Or, at least be consistent in your redundancy, speaking of “round parentheses,” “dotty periods,” “long dashes,” “big capitals,” “elevated overbars” and so on.
And now, for the hard stuff, side effects. R is a functional language, meaning among other things that no variable is changed upon executing a function, other than assignment of the return value. In other words,
y <- f(x)
will change y but neither x nor any other variables, e.g. globals, will be changed.
For many in the R community, this restriction is followed with religious-like fervor. They consider it safe, clear and easy to debug.
Yes, yes, I understand the arguments, and the No Side Effecters have a point. But it is much too dogmatic, in my opinion, and often causes one to incur BIG performance penalties.
I am not here to praise Caeser, but I’m not here to bury him either.
- Reference classes. R has always had some ways to create side effects for the antisocial
, one notable example being environments. Reference classes carry that loophole one step further. - The bigmemory package. Yes, it was developed as a workaround to R’s 32-bit memory constraints (somewhat but not completely resolved in more recent versions of R), but I’m told that one of the reasons for the package’s popularity is its ability change data in-place in memory. This is basically a side effects issue, as traditionally the assignment to y above has created a new, separate copy of y in memory, often a big performance killer. (Again, this too has been ameliorated somewhat in recent versions of R.)
- The data.table package. Same comments here as for the “insidious” use of bigmemory above, but also consider the operation> setkey(mtc1,cyl).Ever notice that no assignment is made? If you rave about data.table, you should keep in mind what factors underly that lightning speed.
I won’t get into the global variables issue here, other than to say again that I think the arguments against them often tend to be more ideological than logical.
In summary, I believe that careful thought, rather than an obsession with rules, is your ticket to good — clear and efficient — code.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.