[This article was first published on R snippets, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Recently on R-bloggers I found a post from chem-bla-ics blog concerning conversion of factors to integer vectors. At the end it stated a problem of conversion of factor variable to class-membership matrix. In comments several nice solutions were provided. Among them notably function classvec2classmat from kohonen package does the trick and is very fast.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Interestingly this problem can be simply solved using basic rep and matrix functions:
f <- factor(sample(c(“A”, “B”, “C”), 8, replace = TRUE))< o:p>
matrix(as.integer(rep(levels(f), each = length(f)) == f),< o:p>
nrow = length(f), dimnames = list(f, levels(f)))In the code we use the fact that R automatically recycles f in comparison. However, classvec2classmat is faster than the solution proposed here. This is easly checked using system.time. On my computer it is roughly two times faster for large number of observations.
Both codes are fast enough for practical applications. However, I wanted to understand the reasons of this speed difference, so I checked out classvec2classmat source:
function (yvec) {
yvec <- factor(yvec)< o:p>
nclasses <- nlevels(yvec)< o:p>
outmat <- matrix(0, length(yvec), nclasses)< o:p>
dimnames(outmat) <- list(NULL, levels(yvec))< o:p>
for (i in 1:nclasses) outmat[which(as.integer(yvec) == i), i] <- 1< o:p>
outmat< o:p>
}< o:p>
The performance gain is due to two reasons:
- my code compares factors not integers (this could be simply fixed, but does not fully solve the problem);
- classvec2classmat uses assignment operation only for indices that need to be set to 1, whereas my code first creates a vector using rep and then transforms it.
To leave a comment for the author, please follow the link and comment on their blog: R snippets.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.