Site icon R-bloggers

Generating all subsets of a set

[This article was first published on R snippets, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Recently I have calculated Banzhaf power index. I required generation of all subsets of a given set. The code given there was a bit complex and I have decided to write a simple function calculating it. As an example of its application I reproduce Figure 3.5 from Hastie et al. (2009).
The figure shows RSS for all possible linear regressions for prostate cancer data on training subset. The standard approach for such a problem in R is to use leaps package, but I simply wanted to test my function generating all subsets of the set.

Here is the code with all.subsets function generating all subsets and its application to prostate cancer data:

library(plyr)< o:p>
library(ggplot2)< o:p>

all.subsets <- function(set) {< o:p>
    n <- length(set)< o:p>
    bin <- expand.grid(rlply(n, c(F, T)))< o:p>
    mlply(bin, function() { set[c()] })< o:p>
}< o:p>

file.url <- “http://www-stat.stanford.edu/~tibs/ElemStatLearn/datasets/prostate.data”< o:p>
data.set <- read.table(file.url, sep = “\t”, header = TRUE)< o:p>
varlist <- all.subsets(names(data.set)[2:9])< o:p>

get.reg <- function(vars) {< o:p>
    if (length(vars) == 0) {< o:p>
        vars = “1”< o:p>
    }< o:p>
    vars.form <- paste(“lpsa ~”, paste(vars, collapse = ” + “))< o:p>
    lm(vars.form, data = data.set, subset = train)< o:p>
}< o:p>

models <- llply(varlist, get.reg)< o:p>
models.RSS <- ldply(models, function(x) {< o:p>
    c(RSS = sum(x$residuals ^ 2), k = length(x$coeff)) })< o:p>

min.models <- ddply(models.RSS, .(k), function(x) {< o:p>
    x[which.min(x$RSS),]})< o:p>
qplot(k, RSS, data = models.RSS, ylim= c(0,100),< o:p>
    xlab = “Subset Size k”, ylab = “ResidualSum-of-Squares”,) +< o:p>
geom_point(data = min.models, aes(x = k, y = RSS), colour = “red”) +< o:p>
geom_line(data = min.models, aes(x = k, y = RSS), colour = “red”) +< o:p>
theme_bw()< o:p>

And here is the plot it generates:


To leave a comment for the author, please follow the link and comment on their blog: R snippets.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.