Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
No, it's not a Jaqen H'ghar quote. Recently, Hadley Wickham tweeted the following image:
While this image isn't included in Hadley's Advanced R book, he does discuss many of the implications there. The most significant of these is that creating a copy of an object in R doesn't consume any additional memory. (Most of the time, anyway: there are exceptions, but I'm not going into them here.) The simplest example is the following:
a <- matrix(0, ncol=1000, nrow=1000)
b <- a
Creating a allocates 8Mb of memory, but creating b requires basically nothing: all that gets created is a new name, b, that points to the same data as a. This isn't like a pointer in C though; if you later modify a or b a new 8Mb matrix will be created in memory to preserve the logical semantics of there being two separate copies. As a user, you don't have to worry about any of this: it all happens behind the scenes, seamlessly, thanks to the "names have objects" concept Hadley illustrates above.
The same thing applies to the call-by-value semantics of functions: if you pass an object to a function, as in eigen(a), the semantics of R is that a copy of the object a is passed to the function eigen, with which it can do as it pleases without modifying the global copy. In practice though, no new memory will be needed for the copy as long as the function doesn't actually mess with the object passed to it. You can prove this to yourself by using the provfis package, or by using the profiling tools in the preview release of RStudio.
This is all likely second nature to experienced R programmers, but it can be a bit of a shock to programmers coming to R from other languages. Miles McBain (who learned to program in C++) had a it of an epiphany on seeing Hadley's diagram, and explored many of the implications in a blog post which is well worth reading. His take-away:
If you’re trying to optimise R while thinking like a C++ coder, you may well be doing more harm than good. I myself have fallen foul of this in an attempt to modify data frames in place with my
pushr
package. It ended up just being syntactic sugar, with no observable performance boost.
If you think you might be in the same boat, take a look at Miles's post linked below.
One weiRd tip: R Has No Primitives
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.