[This article was first published on R snippets, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A very typical task in data analysis is calculation of summary statistics for each variable in data frame. Standard lapply or sapply functions work very nice for this but operate only on single function. The problem is that I often want to calculate several diffrent statistics of the data. For example assume that we want to calculate minimum, maximum and mean value of each variable in data frame.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The simplest solution for this is to write a function that does all the calculations and returns a vector. The sample code is:
multi.fun <- function(x) {< o:p>
c(min = min(x), mean = mean(x), max = max(x))< o:p>
}< o:p>
> sapply(cars, multi.fun)< o:p>
speed dist< o:p>
min 4.0 2.00< o:p>
mean 15.4 42.98< o:p>
max 25.0 120.00< o:p>
However, when I work in interactive mode I would prefer to have a function that would accept multiple functions as arguments. I came up with the following solution to this problem:
multi.sapply <- function(…) {< o:p>
arglist <- match.call(expand.dots = FALSE)$…< o:p>
var.names <- sapply(arglist, deparse)< o:p>
has.name <- (names(arglist) != “”)< o:p>
var.names[has.name] <- names(arglist)[has.name]< o:p>
arglist <- lapply(arglist, eval.parent, n = 2)< o:p>
x <- arglist[[1]]< o:p>
arglist[[1]] <- NULL< o:p>
result <- sapply(arglist, function (FUN, x) sapply(x, FUN), x)< o:p>
colnames(result) <- var.names[-1]< o:p>
return(result)< o:p>
}< o:p>
> multi.sapply(cars, min, mean, max)< o:p>
min mean max< o:p>
speed 4 15.40 25< o:p>
dist 2 42.98 120< o:p>
If function argument is given name it will be used as column name instead of deparsed expression. This functionality is shown by the following example summarizing several statistics of EuStockMarkets data set:
> log.returns <- data.frame(diff(log(EuStockMarkets)))< o:p>
> multi.sapply(log.returns, sd, min,
> VaR10 = function(x) quantile(x, 0.1))
sd min VaR10< o:p>
DAX 0.010300837 -0.09627702 -0.010862458< o:p>
SMI 0.009250036 -0.08382500 -0.009696908< o:p>
CAC 0.011030875 -0.07575318 -0.012354424< o:p>
FTSE 0.007957728 -0.04139903 -0.009139666< o:p>
To leave a comment for the author, please follow the link and comment on their blog: R snippets.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.