str Implementation for Data Frames
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The str
function is perhaps the most useful function in R. It provides great information about the structure of some object. When I teach R, especially for those coming from SPSS, the str
function for data frames provides the information they are use to seeing on the variable view tab. However, sometimes I want to display the information str
returns in a better format (e.g. as an HTML or LaTeX table). I wrote a function, strtable
that provides the information str.data.frame
does but returns the results as a data.frame
. This provides much more flexibility for controlling how the output is formatted. Specifically, it will return a data.frame
with four columns: variable
, class
, levels
, and examples
.
The function can be sourced from Gist using the devtools
package.
devtools::source_gist('4a0a5ab9fe7e1cf3be0e')
For the first example, we’ll use the iris
data frame.
data(iris) str(iris) ## 'data.frame': 150 obs. of 5 variables: ## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... ## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... ## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... ## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... ## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
The strtable
has five parameters:
n
the first n element to showwidth
maximum width in characters for the examples to shown.levels
the first n levels of a factor to show.width.levels
maximum width in characters for the number of levels to show.factor.values
function defining how factor examples should be printed. Possible values areas.character
oras.integer
.
print(strtable(iris), na.print='') ## variable class levels ## Sepal.Length numeric ## Sepal.Width numeric ## Petal.Length numeric ## Petal.Width numeric ## Species Factor w/ 3 levels "setosa", "versicolor", "virginica" ## examples ## 5.1, 4.9, 4.7, 4.6, ... ## 3.5, 3, 3.2, 3.1, ... ## 1.4, 1.4, 1.3, 1.5, ... ## 0.2, 0.2, 0.2, 0.2, ... ## "setosa", "setosa", "setosa", "setosa", ... print(strtable(iris, factor.values=as.integer), na.print='') ## variable class levels ## Sepal.Length numeric ## Sepal.Width numeric ## Petal.Length numeric ## Petal.Width numeric ## Species Factor w/ 3 levels "setosa", "versicolor", "virginica" ## examples ## 5.1, 4.9, 4.7, 4.6, ... ## 3.5, 3, 3.2, 3.1, ... ## 1.4, 1.4, 1.3, 1.5, ... ## 0.2, 0.2, 0.2, 0.2, ... ## 1, 1, 1, 1, ...
Here’s a second example using the diamonds
data from the ggplot2
package.
data(diamonds) str(diamonds) ## 'data.frame': 53940 obs. of 10 variables: ## $ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ... ## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ... ## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ... ## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ... ## $ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ... ## $ table : num 55 61 65 58 58 57 57 55 61 61 ... ## $ price : int 326 326 327 334 335 336 336 337 337 338 ... ## $ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ... ## $ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ... ## $ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ... print(strtable(diamonds), na.print='') ## variable class levels ## carat numeric ## cut Factor w/ 5 levels "Fair", "Good", "Very Good", "Premium", ... ## color Factor w/ 7 levels "D", "E", "F", "G", ... ## clarity Factor w/ 8 levels "I1", "SI2", "SI1", "VS2", ... ## depth numeric ## table numeric ## price integer ## x numeric ## y numeric ## z numeric ## examples ## 0.23, 0.21, 0.23, 0.29, ... ## "Ideal", "Premium", "Good", "Premium", ... ## "E", "E", "E", "I", ... ## "SI2", "SI1", "VS1", "VS2", ... ## 61.5, 59.8, 56.9, 62.4, ... ## 55, 61, 65, 58, ... ## 326, 326, 327, 334, ... ## 3.95, 3.89, 4.05, 4.2, ... ## 3.98, 3.84, 4.07, 4.23, ... ## 2.43, 2.31, 2.31, 2.63, ... print(strtable(diamonds, factor.values=as.integer), na.print='') ## variable class levels ## carat numeric ## cut Factor w/ 5 levels "Fair", "Good", "Very Good", "Premium", ... ## color Factor w/ 7 levels "D", "E", "F", "G", ... ## clarity Factor w/ 8 levels "I1", "SI2", "SI1", "VS2", ... ## depth numeric ## table numeric ## price integer ## x numeric ## y numeric ## z numeric ## examples ## 0.23, 0.21, 0.23, 0.29, ... ## 5, 4, 2, 4, ... ## 2, 2, 2, 6, ... ## 2, 3, 5, 4, ... ## 61.5, 59.8, 56.9, 62.4, ... ## 55, 61, 65, 58, ... ## 326, 326, 327, 334, ... ## 3.95, 3.89, 4.05, 4.2, ... ## 3.98, 3.84, 4.07, 4.23, ... ## 2.43, 2.31, 2.31, 2.63, ...
Here’s the source code from Gist:
#' Creates a \code{data.frame} version of the str function for data.frames. | |
#' | |
#' Note that this function only works with \code{data.frames}. The function | |
#' will throw an error for any other object types. | |
#' | |
#' @param n the first n element to show | |
#' @param width maximum width in characters for the examples to show | |
#' @param n.levels the first n levels of a factor to show. | |
#' @param width.levels maximum width in characters for the number of levels to show. | |
#' @param factor.values function defining how factor examples should be printed. | |
#' Possible values are \code{as.character} or \code{as.integer}. | |
#' @export | |
#' @examples | |
#' data(iris) | |
#' str(iris) | |
#' strtable(iris) | |
#' strtable(iris, factor.values=as.integer) | |
strtable <- function(df, n=4, width=60, | |
n.levels=n, width.levels=width, | |
factor.values=as.character) { | |
stopifnot(is.data.frame(df)) | |
tab <- data.frame(variable=names(df), | |
class=rep(as.character(NA), ncol(df)), | |
levels=rep(as.character(NA), ncol(df)), | |
examples=rep(as.character(NA), ncol(df)), | |
stringsAsFactors=FALSE) | |
collapse.values <- function(col, n, width) { | |
result <- NA | |
for(j in 1:min(n, length(col))) { | |
el <- ifelse(is.numeric(col), | |
paste0(col[1:j], collapse=', '), | |
paste0('"', col[1:j], '"', collapse=', ')) | |
if(nchar(el) <= width) { | |
result <- el | |
} else { | |
break | |
} | |
} | |
if(length(col) > n) { | |
return(paste0(result, ', ...')) | |
} else { | |
return(result) | |
} | |
} | |
for(i in seq_along(df)) { | |
if(is.factor(df[,i])) { | |
tab[i,]$class <- paste0('Factor w/ ', nlevels(df[,i]), ' levels') | |
tab[i,]$levels <- collapse.values(levels(df[,i]), n=n.levels, width=width.levels) | |
tab[i,]$examples <- collapse.values(factor.values(df[,i]), n=n, width=width) | |
} else { | |
tab[i,]$class <- class(df[,i])[1] | |
tab[i,]$examples <- collapse.values(df[,i], n=n, width=width) | |
} | |
} | |
class(tab) <- c('strtable', 'data.frame') | |
return(tab) | |
} | |
#' Prints the results of \code{\link{strtable}}. | |
#' @param x result of code \code{\link{strtable}}. | |
#' @param ... other parameters passed to \code{\link{print.data.frame}}. | |
#' @export | |
print.strtable <- function(x, ...) { | |
NextMethod(x, row.names=FALSE, ...) | |
} |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.