How to Create Summary Tables in R

[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to Create Summary Tables in R appeared first on Data Science Tutorials

How to Create Summary Tables in R?, The describe() and describeBy() methods from the psych package is the simplest to use for creating summary tables in R.

How to apply a transformation to multiple columns in R?

library(psych)

Let’s create a summary table

describe(df)

We can now create a summary table that is organized by a certain variable.

describeBy(df, group=df$var_name)

The practical application of these features is demonstrated in the examples that follow.

Example 1:- Create a simple summary table

Let’s say we have the R data frame shown below:

make a data frame

df <- data.frame(team=c('P1', 'P1', 'P1', 'P2', 'P2', 'P2', 'P1'),
points=c(150, 222, 229, 421, 330, 211, 219),
rebounds=c(17, 28, 36, 16, 17, 29, 15),
steals=c(11, 151, 152, 73, 85, 79, 58))

Now we can view the data frame

df
   team points rebounds steals
1   P1    150       17     11
2   P1    222       28    151
3   P1    229       36    152
4   P2    421       16     73
5   P2    330       17     85
6   P2    211       29     79
7   P1    219       15     58

For each variable in the data frame, a summary table can be made using the describe() function.

Add new calculated variables to a data frame and drop all existing variables

library(psych)

Now will create a summary table

describe(df)
vars n   mean    sd median trimmed   mad min max range skew kurtosis
team*       1 7   1.43  0.53      1    1.43  0.00   1   2     1 0.23    -2.20
points      2 7 254.57 90.56    222  254.57 16.31 150 421   271 0.71    -1.03
rebounds    3 7  22.57  8.30     17   22.57  2.97  15  36    21 0.44    -1.73
steals      4 7  87.00 50.34     79   87.00 31.13  11 152   141 0.08    -1.47
            se
team*     0.20
points   34.23
rebounds  3.14
steals   19.03

Here’s how to interpret each value in the output:

vars: column number

n: Number of valid cases

mean: The mean value

median: The median value

trimmed: The trimmed mean (default trims 10% of observations from each end)

mad: The median absolute deviation (from the median)

min: The minimum value

max: The maximum value

range: The range of values (max – min)

skew: The skewness

kurtosis: The kurtosis

se: The standard error

Any variable that has an asterisk (*) next to it has been transformed from being categorical or logical to becoming a numerical variable with values that represent the numerical ordering of the values.

How to Use Spread Function in R?-tidyr

We shouldn’t take the summary statistics for the variable “team” which has been transformed into a numerical variable.

Also, take note that the setting fast=TRUE allows you to merely compute the most typical summary statistics.

Now we can create a smaller summary table

describe(df, fast=TRUE)
         vars n   mean    sd min  max range    se
team        1 7    NaN    NA Inf -Inf  -Inf    NA
points      2 7 254.57 90.56 150  421   271 34.23
rebounds    3 7  22.57  8.30  15   36    21  3.14
steals      4 7  87.00 50.34  11  152   141 19.03

Additionally, we have the option of only computing the summary statistics for a subset of the data frame’s variables:

make a summary table using only the columns “points” and “rebounds”

describe(df[ , c('points', 'rebounds')], fast=TRUE)
         vars n   mean    sd min max range    se
points      1 7 254.57 90.56 150 421   271 34.23
rebounds    2 7  22.57  8.30  15  36    21  3.14

Example 2: Make a summary table that is grouped by a certain variable.

The describeBy() function can be used to group the data frame’s summary table by the variable “team” using the following code.

build the summary table with teams as the primary grouping.

How to Use Mutate function in R – Data Science Tutorials

describeBy(df, group=df$team, fast=TRUE)

Descriptive statistics by group

group: P1
         vars n mean    sd min  max range    se
team        1 4  NaN    NA Inf -Inf  -Inf    NA
points      2 4  205 36.91 150  229    79 18.45
rebounds    3 4   24  9.83  15   36    21  4.92
steals      4 4   93 70.22  11  152   141 35.11
-------------------------------------------------------------
group: P2
         vars n   mean     sd min  max range    se
team        1 3    NaN     NA Inf -Inf  -Inf    NA
points      2 3 320.67 105.31 211  421   210 60.80
rebounds    3 3  20.67   7.23  16   29    13  4.18
steals      4 3  79.00   6.00  73   85    12  3.46

The summary statistics for each of the three teams in the data frame are displayed in the output.

The post How to Create Summary Tables in R appeared first on Data Science Tutorials

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)