Site icon R-bloggers

Creating Beautiful and Flexible Summary Statistics Tables in R With gtsummary

[This article was first published on R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

gtsummary is a great package for doing summary statistics tables in R. The package has a lot of functionality and I like the flexibility of the package. Doing summary statistics tables with this package is very easy and I like this package almost as much as the arsenal package. Almost as much because it is not as mature yet but will certainly become as good or better as the arsenal package for summary statistics tables in R.

The great thing about gtsummary is that you can create summary statistics tables and also other tables such as regression tables. It does not need a lot of lines of code to create a nice looking table.

First, we will be loading some libraries. We will be displaying tables with the gapminder data set. With a lot of summary statistics tables, it is difficult to display missing values in a proper way and oftentimes, there is only one default method that cannot be changed. With the gtsummary package, one has lots of options with how to customize their summary statistics table.

Let’s get started

Creating A Basic Summary Statistics Table in R

library(tidyverse)
library(gtsummary)
library(gapminder)

gap <- gapminder %>%
  dplyr::mutate_all(~ ifelse(
    sample(c(TRUE, FALSE), size = length(.), replace = TRUE, prob = c(0.8, 0.2)),
    as.character(.),
    NA
  )) %>%
  dplyr::mutate_at(vars(year:gdpPercap), ~ as.numeric(.)) %>%
  dplyr::mutate(gdpPercap = ifelse(gdpPercap > median(gdpPercap, na.rm = TRUE), "high", "low"))

The default summary statistics table looks pretty good for only one line of code.

gap <- gap %>% select(-country)
table1 <- tbl_summary(gap)
table1
< !-- Begin magicthumb -->
< !-- End magicthumb -->

Customizing a Summary Statistics Table in R

We can also customize the table a bit by changing labels, adding some more summary statistics, and customizing some other things. In the example below,we are adding some more summary statistics, renaming the variables, making the labels bold, and modifying the header as well.

gap %>%
  gtsummary::tbl_summary(
    label = list(
      continent ~ "Continent", year ~ "Year",
      lifeExp ~ "Life Expectancy", pop ~ "Population",
      gdpPercap ~ "GDP per Capita"
    ),
    type = all_continuous() ~ "continuous2",
    statistic = all_continuous() ~ c(
      "{median} ({p25}, {p75})",
      "{min}, {max}"
    )
  ) %>%
  add_n() %>%
  bold_labels() %>%
  modify_header(label ~ "**Variable**")
< !-- Begin magicthumb -->
< !-- End magicthumb -->

Setting Themes for Summary Statistics Tables in R and Creating A Table By Group

With the gtsummary package for summary statistics tables, we can also set a theme for the table. This is convenient when we have to create a lot of tables. We can set the controls of the table globally. With the theme below, I am adding summary statistics of my choice and I am formatting how the numbers are displayed in the summary statistics table. We can then set the theme with gtsummary::set_gtsummary_theme(my_theme). Next, we are displaying the summary table by a group, continent.

my_theme <-
  list(
    "tbl_summary-str:default_con_type" = "continuous2",
    "tbl_summary-str:continuous_stat" = c(
      "{median} ({p25} - {p75})",
      "{mean} ({sd})",
      "{min} - {max}"
    ),
    "tbl_summary-str:categorical_stat" = "{n} / {N} ({p}%)",
    "style_number-arg:big.mark" = "",
    "tbl_summary-fn:percent_fun" = function(x) style_percent(x, digits = 3)
  )

gtsummary::set_gtsummary_theme(my_theme)

gap %>%
  gtsummary::tbl_summary(
    by = continent,
    missing = "always",
    missing_text = "Missing",
    list(
      year ~ "Year",
      lifeExp ~ "Life Expectancy", pop ~ "Population",
      gdpPercap ~ "GDP per Capita"
    )
  ) %>%
  add_n() %>%
  bold_labels() %>%
  modify_header(label ~ "**Variable**") %>%
  add_p()
< !-- Begin magicthumb -->
< !-- End magicthumb -->

If we are deciding to always have bold labels and p-values displayed in the summary statistics table, then we can create our own function to do so.

my_modified_gtsummary_tbl <- function(...) {
  gtsummary::tbl_summary(
    ...
  ) %>%
    add_n() %>%
    bold_labels() %>%
    modify_header(label ~ "**Variable**") %>%
    add_p()
}

gap %>%
  my_modified_gtsummary_tbl(
    by = continent,
    missing = "always",
    missing_text = "Missing",
    list(
      year ~ "Year",
      lifeExp ~ "Life Expectancy", pop ~ "Population",
      gdpPercap ~ "GDP per Capita"
    )
  )

This is another way how you can extend your theme for your summary statistics table with the gtsummary package.

Summary Statistics Regression Tables in R

The gtsummary package also includes tables for summarizing regression tables (linear or logistic) and also survival output tables. The table below shows a linear regression table.

gap %>%
  lm(lifeExp ~ ., data = .) %>%
  gtsummary::tbl_regression()
< !-- Begin magicthumb -->
< !-- End magicthumb -->

Additional Resources

I hope you enjoyed this short tutorial about summary statistics tables in R with the gtsummary package.

The post Creating Beautiful and Flexible Summary Statistics Tables in R With gtsummary appeared first on .

To leave a comment for the author, please follow the link and comment on their blog: R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.