[This article was first published on R snippets, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
When producing regression or classification trees (standard rpart or ctree from party package) in GNU R I am often unsatisfied with the default plots they produce. One of many possible solutions is to export a tree plot to Asymptote.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The code I have prepared generates an Asymptote file based on generated ctree object. Here is the procedure that does the conversion.
treeAsy <- function(tree, # ctree to be plotted< o:p>
off.f, # tree plot fixed shift< o:p>
off.v, # tree plot variable shift< o:p>
file.name, # output file name< o:p>
preamble) { # preamble for asy< o:p>
minv <- +Inf< o:p>
maxv <- –Inf< o:p>
response <- names(tree@responses@variables)< o:p>
plot.node <- function(root,< o:p>
nest = 0, # level in a tree< o:p>
pOffset = 0, # plotting offset< o:p>
condition = “root”, # split condition text< o:p>
id = “root”) { # block name in asy< o:p>
if (length(root$prediction) > 1) {< o:p>
stop(“Only single prediction value supported”)< o:p>
}< o:p>
if (root$prediction < minv) minv <<- root$prediction< o:p>
if (root$prediction > maxv) maxv <<- root$prediction< o:p>
child.l <- “”< o:p>
child.r <- “”< o:p>
if (!root$terminal) {< o:p>
if (class(root$psplit) == “orderedSplit”) {< o:p>
varN <- root$psplit$variableName< o:p>
point <- root$psplit$splitpoint< o:p>
left <- paste(varN, “$\\leq$”, point, sep=“”)< o:p>
right <- paste(varN, “$>$”, point, sep=“”)< o:p>
} else {< o:p>
stop(“Only orderedSplit supported”)< o:p>
} < o:p>
add <- “add(new void(picture pic, transform t) {< o:p>
blockconnector operator –=blockconnector(pic,t);\n “< o:p>
child.l <- paste(plot.node(root$left, nest + 1,< o:p>
pOffset – off.f – 1 / off.v ^ nest,< o:p>
left, paste(id,“l”,sep=“”)),< o:p>
add,id,“–Down–Left–Down–“,id,“l;\n});\n\n”, sep=“”)< o:p>
child.r <- paste(plot.node(root$right, nest + 1,< o:p>
pOffset + off.f + 1 / off.v ^ nest,< o:p>
right, paste(id,“r”,sep=“”)),< o:p>
add,id,“–Down–Right–Down–“,id,“r;\n});\n\n”, sep=“”)< o:p>
}< o:p>
paste(“block “, id, ” = rectangle(Label(\””,< o:p>
condition, “\”),< o:p>
pack(Label(\”n=”, sum(root$weights), “\”),< o:p>
Label(\””, response, “=”,< o:p>
format(root$prediction), “\”)),< o:p>
(“, pOffset, “,”, –nest, “), lightgray, col(“,< o:p>
root$prediction, “));”,< o:p>
“\ndraw(“, id,“);\n\n”,< o:p>
child.l, child.r, sep=“”)< o:p>
}< o:p>
treestruct <- plot.node(tree@tree)< o:p>
< o:p>
cat(file=file.name,< o:p>
preamble,< o:p>
“\nimport flowchart;\n”,< o:p>
“pen col(real x) {< o:p>
real minv = “, minv, “;< o:p>
real maxv = “, maxv, “;< o:p>
real ratio = 1 – (x – minv) / (maxv – minv);< o:p>
return rgb(1, ratio, ratio);< o:p>
}\n\n”,< o:p>
treestruct, “\n”, sep=“”)< o:p>
shell(paste(“asy -f png”, file.name))< o:p>
}< o:p>
Each node on the plot contains: the condition leading to it, number of observations and response variable prediction (also intensity of red indicates its relative value).
In order to keep the example simple it is very simplified. Currently handles only regression trees with continuous predictors and will generate errors if variable names contain TeX special characters (like &). Additionally you can control the tree layout only manually by setting variables off.f and off.v or by manipulating picture size in the preamble (and one could write a code to layout the plot automatically).
The code produces png output as I needed this format to show the picture on blog, but of course you can generate eps or pdf file which is probably a more suitable option.
And there is the example of the code use based on standard ctree example:
library(party)< o:p>
airq <- subset(airquality, !is.na(Ozone))< o:p>
airct <- ctree(Ozone ~ ., data = airq)< o:p>
treeAsy(airct, –0.25, 1.4, “tree.asy”,< o:p>
“size(22cm,12cm, keepAspect=false);”)< o:p>
It gives the following output:
Which is much nicer for me in comparison to default plot generated by plot(airct):
To leave a comment for the author, please follow the link and comment on their blog: R snippets.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.