[This article was first published on R – ipub, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Question
I recently got a mail from Václav on reference semantics in data.tree, reading as follows:
Dear Christoph,I am rather inexperienced when it comes to environments in R and henceforth I apologize if my question is basic; however, my colleagues are no better than me to answer my question.I would have a question iro the following behavior of your data.tree package. Is it correct that if I create a function which uses some data.tree structure as a parameter, the input value would get changed too?In the following case I would assume that acme’s values should not get changed.Thank you, Vaclav
The code he provided was similar to this:
library(data.tree) data(acme) acme$val <- 5 acme$val DoAssign <- function(tr) { a <- tr a$val <- 33 return (a) } acme2 <- DoAssign(acme) acme$val [1] 33
Answer
My answer was as follows:
Well observed, that is indeed the behavior of data.tree. From the manual:
Node and Reference Semantics
The entry point to the package is Node. Each tree is composed of a number of Nodes, referencing each other. One of most important things to note about data.tree is that it exhibits reference semantics. In a nutshell, this means that you can modify your tree along the way, without having to reassign it to a variable after each modification. By and large, this is a rather exceptional behavior in R, where value-semantics is king most of the time.
Reference Semantics Explained
In a nutshell, reference semantics can be understood by the following analogy: If I give you a URL, I provide you with a reference to a web page. You, I and the owner of the web page can access that web page with that URL. And if the owner changes the content, then you will see these changes next time you connect to the URL.
Contrarily, if I print out the web page and give you that print out, then I provide you with a disconnected copy of the web page. You may modify that copy (e.g. by highlighting passages with a marker), but I will not see these changes, nor will you see changes made in the original page by the owner. This is value semantics.
Why data.tree uses reference semantics
The main reason why we chose to do it that way in data.tree is that we treat each Node as a unit. When modifying a Node, or when adding a field to a Node, we do not want to create a deep copy of the entire tree for performance reasons.
Another reason is that it greatly simplifies the API of the package. For example, we can do:
library(data.tree) data(acme) #get a list of Nodes traversal <- Traverse(acme, filterFun = function(node) !is.null(node$cost)) #modify a field Do(traversal, function(node) node$cost2 <- node$cost * 1.2) #the value is now modified in the original tree: print(acme, "cost", "cost2")
Reference Semantics in R
While very common with object oriented languages (e.g. C++, Java, C#), this paradigm is not very wide-spread in R. Though it’s gaining more and more acceptance. Check out, for instance, the := operator in data.table, or google R reference classes, or R6.
The downside is, of course, that it might seem confusing at first.
Hope that helps!
The post Reference semantics in R appeared first on ipub.
To leave a comment for the author, please follow the link and comment on their blog: R – ipub.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.