Site icon R-bloggers

New formatting features in the parameters package

[This article was first published on R on easystats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
  • You probably already have heard of the parameters package, a light-weight package to extract, compute and explore the parameters of statistical models using R (if not, there is a related publication introducing the package’s main features).

    In this post, we like to introduce a new feature that facilitates nicely rendered output in markdown or HTML format (including PDFs). This allows you to easily create pretty tables of model summaries, for a large variety of models.

    The parameters package, together with the insight package, provides those tools to format the layout and style of tables from model parameters. The easy way is using the model_parameters() function, where usually don’t have to take care about formatting and layout, at least not for simple purposes like printing to the console or inside rmarkdown documents. However, sometimes you may want to do the formatting steps manually. This blog post introduces the various functions that are used for parameters table formatting.

    An Example Model

    We start with a model that does not make much sense, but it is useful for demonstrating the formatting functions.

    data(iris)
    iris$Petlen <- cut(iris$Petal.Length, breaks = c(0, 3, 7))
    model <- lm(Sepal.Width ~ poly(Sepal.Length, 2) + Species + Petlen, data = iris)
    
    summary(model)
    ## 
    ## Call:
    ## lm(formula = Sepal.Width ~ poly(Sepal.Length, 2) + Species + 
    ##     Petlen, data = iris)
    ## 
    ## Residuals:
    ##     Min      1Q  Median      3Q     Max 
    ## -0.7742 -0.1490 -0.0056  0.1666  0.6973 
    ## 
    ## Coefficients:
    ##                        Estimate Std. Error t value Pr(>|t|)    
    ## (Intercept)              3.8127     0.0582   65.50  < 2e-16 ***
    ## poly(Sepal.Length, 2)1   4.0602     0.4668    8.70    7e-15 ***
    ## poly(Sepal.Length, 2)2  -1.3024     0.3149   -4.14    6e-05 ***
    ## Speciesversicolor       -1.0056     0.2781   -3.62  0.00041 ***
    ## Speciesvirginica        -0.9913     0.2851   -3.48  0.00067 ***
    ## Petlen(3,7]             -0.1360     0.2818   -0.48  0.63019    
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 0.28 on 144 degrees of freedom
    ## Multiple R-squared:  0.615,  Adjusted R-squared:  0.602 
    ## F-statistic:   46 on 5 and 144 DF,  p-value: <2e-16

    Formatting Parameter Names

    As we can see, in such cases, the standard R output looks a bit cryptic, although all necessary and important information is included in the summary. The formatting of coefficients for polynomial transformation is difficult to read, factors grouped with cut() always require a short time of thinking to find out which of the bound (in this case, Petlen(3,7], 3 and 7) is included in the range, and names of factor levels are directly concatenated to the name of the factor variable.

    Thus, the first step would be to format the parameter names, which can be done with format_parameters() from the parameters package:

    library(parameters)
    format_parameters(model)
    ##                 (Intercept)      poly(Sepal.Length, 2)1 
    ##               "(Intercept)" "Sepal.Length [1st degree]" 
    ##      poly(Sepal.Length, 2)2           Speciesversicolor 
    ## "Sepal.Length [2nd degree]"      "Species [versicolor]" 
    ##            Speciesvirginica                 Petlen(3,7] 
    ##       "Species [virginica]"              "Petlen [4-7]"

    format_parameters() returns a (named) character vector with the original coefficients as names of each character element, and the formatted names of the coefficients as values of the character vector. Let’s look at the results again:

    cat(format_parameters(model), sep = "\n")
    ## (Intercept)
    ## Sepal.Length [1st degree]
    ## Sepal.Length [2nd degree]
    ## Species [versicolor]
    ## Species [virginica]
    ## Petlen [4-7]

    Now variable names and factor levels, but also polynomial terms or even factors grouped with cut() are much more readable. Factor levels are separated from the variable name, inside brackets. Same for the coefficients of the different polynomial degrees. And the exact range for cut()-factors is also clearer now.

    Standardizing Column Names of Parameter Tables

    As seen above, the summary() returns columns named Estimate, t value or Pr(>|t|). While Estimate is not specific for certain models, t value is. For logistic regression models, you would get z value. Some packages alter the names, so you get just t or t-value etc.

    model_parameters() also uses context-specific column names, where applicable:

    colnames(model_parameters(model))
    ## [1] "Parameter"   "Coefficient" "SE"          "CI_low"      "CI_high"    
    ## [6] "t"           "df_error"    "p"

    For Bayesian models, Coefficient is usually named Median etc. While this makes sense from a user perspective, because you instantly know which type of statistic or coefficient you have, it becomes difficult when you need a generic naming scheme to access model parameters when the input model is unknown. This is the typical approach from the broom package, where you get “standardized” column names:

    library(broom)
    colnames(tidy(model))
    ## [1] "term"      "estimate"  "std.error" "statistic" "p.value"

    To deal with such situations, the insight package provides a standardize_names() function, which exactly does that: standardizing the column names of the input. In the following example, you see that the statistic-column is no longer named t, but statistic. df_error or df_residuals will be renamed to df.

    library(insight)
    library(magrittr)
    model %>% 
      model_parameters() %>% 
      standardize_names() %>% 
      colnames()
    ## [1] "Parameter"   "Coefficient" "SE"          "CI_low"      "CI_high"    
    ## [6] "Statistic"   "df"          "p"

    Furthermore, you can request “broom”-style for column names:

    model %>% 
      model_parameters() %>% 
      standardize_names(style = "broom") %>% 
      colnames()
    ## [1] "term"      "estimate"  "std.error" "conf.low"  "conf.high" "statistic"
    ## [7] "df.error"  "p.value"

    Formatting Column Names and Columns

    Beside formatting parameter names (coefficient names) using format_parameters(), we can do even more to make the output more readable. Let’s look at an example that includes confidence intervals.

    cbind(summary(model)$coefficients, confint(model))
    ##                        Estimate Std. Error t value Pr(>|t|) 2.5 % 97.5 %
    ## (Intercept)                3.81      0.058   65.50 4.6e-109  3.70   3.93
    ## poly(Sepal.Length, 2)1     4.06      0.467    8.70  7.0e-15  3.14   4.98
    ## poly(Sepal.Length, 2)2    -1.30      0.315   -4.14  6.0e-05 -1.92  -0.68
    ## Speciesversicolor         -1.01      0.278   -3.62  4.1e-04 -1.56  -0.46
    ## Speciesvirginica          -0.99      0.285   -3.48  6.7e-04 -1.55  -0.43
    ## Petlen(3,7]               -0.14      0.282   -0.48  6.3e-01 -0.69   0.42

    We can get a similar tabular output using broom.

    tidy(model, conf.int = TRUE)
    ## # A tibble: 6 x 7
    ##   term                 estimate std.error statistic   p.value conf.low conf.high
    ##   <chr>                   <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
    ## 1 (Intercept)             3.81     0.0582    65.5   4.61e-109    3.70      3.93 
    ## 2 poly(Sepal.Length, ~    4.06     0.467      8.70  7.00e- 15    3.14      4.98 
    ## 3 poly(Sepal.Length, ~   -1.30     0.315     -4.14  5.98e-  5   -1.92     -0.680
    ## 4 Speciesversicolor      -1.01     0.278     -3.62  4.12e-  4   -1.56     -0.456
    ## 5 Speciesvirginica       -0.991    0.285     -3.48  6.72e-  4   -1.55     -0.428
    ## 6 Petlen(3,7]            -0.136    0.282     -0.482 6.30e-  1   -0.693     0.421

    Some improvements according to readability could be collapsing and formatting the confidence intervals, and maybe the p-values. This would require some effort, for instance, to format the values of the lower and upper confidence intervals and collapsing them into one column. However, the format_table() function is a convenient function that does all the work for you.

    format_table() requires a data frame with model parameters as input, however, there are some requirements to make format_table() work. In particular, the column names must follow a certain pattern to be recognized, and this pattern may either be the naming convention from broom or the easystats packages.

    model %>% 
      tidy(conf.int = TRUE) %>% 
      format_table()
    ##                     term estimate std.error statistic p.value       conf.int
    ## 1            (Intercept)     3.81      0.06     65.50  < .001 [ 3.70,  3.93]
    ## 2 poly(Sepal.Length, 2)1     4.06      0.47      8.70  < .001 [ 3.14,  4.98]
    ## 3 poly(Sepal.Length, 2)2    -1.30      0.31     -4.14  < .001 [-1.92, -0.68]
    ## 4      Speciesversicolor    -1.01      0.28     -3.62  < .001 [-1.56, -0.46]
    ## 5       Speciesvirginica    -0.99      0.29     -3.48  < .001 [-1.55, -0.43]
    ## 6            Petlen(3,7]    -0.14      0.28     -0.48  0.630  [-0.69,  0.42]

    When the parameters table also includes degrees of freedom, and the degrees of freedom are the same for each parameter, then this information is included in the statistic-column. This is usually the default for model_parameters():

    model %>% 
      model_parameters() %>% 
      format_table()
    ##                   Parameter Coefficient   SE         95% CI t(144)      p
    ## 1               (Intercept)        3.81 0.06 [ 3.70,  3.93]  65.50 < .001
    ## 2 Sepal.Length [1st degree]        4.06 0.47 [ 3.14,  4.98]   8.70 < .001
    ## 3 Sepal.Length [2nd degree]       -1.30 0.31 [-1.92, -0.68]  -4.14 < .001
    ## 4      Species [versicolor]       -1.01 0.28 [-1.56, -0.46]  -3.62 < .001
    ## 5       Species [virginica]       -0.99 0.29 [-1.55, -0.43]  -3.48 < .001
    ## 6              Petlen [4-7]       -0.14 0.28 [-0.69,  0.42]  -0.48 0.630

    Exporting the Parameters Table

    Finally, export_table() from insight formats the data frame and returns a character vector that can be printed to the console or inside rmarkdown documents. The data frame then looks more “table-like”.

    data(mtcars)
    cat(export_table(mtcars[1:8, 1:5]))
    ##   mpg | cyl |   disp |  hp | drat
    ## ---------------------------------
    ## 21.00 |   6 | 160.00 | 110 | 3.90
    ## 21.00 |   6 | 160.00 | 110 | 3.90
    ## 22.80 |   4 | 108.00 |  93 | 3.85
    ## 21.40 |   6 | 258.00 | 110 | 3.08
    ## 18.70 |   8 | 360.00 | 175 | 3.15
    ## 18.10 |   6 | 225.00 | 105 | 2.76
    ## 14.30 |   8 | 360.00 | 245 | 3.21
    ## 24.40 |   4 | 146.70 |  62 | 3.69

    Putting all this together allows us to create nice tabular outputs of parameters tables. This can be done using broom:

    model %>% 
      tidy(conf.int = TRUE) %>% 
      format_table() %>% 
      export_table() %>% 
      cat()
    ## term                   | estimate | std.error | statistic | p.value |       conf.int
    ## ------------------------------------------------------------------------------------
    ## (Intercept)            |     3.81 |      0.06 |     65.50 |  < .001 | [ 3.70,  3.93]
    ## poly(Sepal.Length, 2)1 |     4.06 |      0.47 |      8.70 |  < .001 | [ 3.14,  4.98]
    ## poly(Sepal.Length, 2)2 |    -1.30 |      0.31 |     -4.14 |  < .001 | [-1.92, -0.68]
    ## Speciesversicolor      |    -1.01 |      0.28 |     -3.62 |  < .001 | [-1.56, -0.46]
    ## Speciesvirginica       |    -0.99 |      0.29 |     -3.48 |  < .001 | [-1.55, -0.43]
    ## Petlen(3,7]            |    -0.14 |      0.28 |     -0.48 |  0.630  | [-0.69,  0.42]

    Or, in a simpler way and with much more options (like standardizing, robust standard errors, bootstrapping, …) using model_parameters(), which print()-method does all these steps automatically:

    model_parameters(model)
    ## Parameter                 | Coefficient |   SE |         95% CI | t(144) |      p
    ## ---------------------------------------------------------------------------------
    ## (Intercept)               |        3.81 | 0.06 | [ 3.70,  3.93] |  65.50 | < .001
    ## Sepal.Length [1st degree] |        4.06 | 0.47 | [ 3.14,  4.98] |   8.70 | < .001
    ## Sepal.Length [2nd degree] |       -1.30 | 0.31 | [-1.92, -0.68] |  -4.14 | < .001
    ## Species [versicolor]      |       -1.01 | 0.28 | [-1.56, -0.46] |  -3.62 | < .001
    ## Species [virginica]       |       -0.99 | 0.29 | [-1.55, -0.43] |  -3.48 | < .001
    ## Petlen [4-7]              |       -0.14 | 0.28 | [-0.69,  0.42] |  -0.48 | 0.630

    Formatting the Parameters Table in Markdown

    export_table() provides a few options to generate tables in markdown-format. This allows to easily render nice-looking tables inside markdown-documents. First of all, use format = "markdown" to activate the markdown-formatting. caption can be used to add a table caption. Furthermore, align allows to choose an alignment for all table columns, or to specify the alignment for each column individually.

    The following table has six columns. Using align = "lcccrr" would left-align the first column, center columns two to four, and right-align the last two columns.

    model %>% 
      tidy(conf.int = TRUE) %>% 
      # parenthesis look better in markdown-tables, so we use "brackets" here
      format_table(ci_brackets = c("(", ")")) %>% 
      export_table(format = "markdown", caption = "My Table", align = "lcccrr")
    My Table
    term estimate std.error statistic p.value conf.int
    (Intercept) 3.81 0.06 65.50 < .001 ( 3.70, 3.93)
    poly(Sepal.Length, 2)1 4.06 0.47 8.70 < .001 ( 3.14, 4.98)
    poly(Sepal.Length, 2)2 -1.30 0.31 -4.14 < .001 (-1.92, -0.68)
    Speciesversicolor -1.01 0.28 -3.62 < .001 (-1.56, -0.46)
    Speciesvirginica -0.99 0.29 -3.48 < .001 (-1.55, -0.43)
    Petlen(3,7] -0.14 0.28 -0.48 0.630 (-0.69, 0.42)

    print_md() is a convenient wrapper around format_table() and export_table(format = "markdown"), and allows to directly format the output of functions like model_parameters(), simulate_parameters() or other parameters functions in markdown-format.

    These tables are also nicely formatted when knitting markdown-documents into Word or PDF. print_md() applies some default settings that have proven to work well for markdown, PDF or Word tables.

    model_parameters(model) %>% print_md()
    Parameter Coefficient SE 95% CI t(144) p
    (Intercept) 3.81 0.06 (3.70, 3.93) 65.50 < .001
    Sepal.Length (1st degree) 4.06 0.47 (3.14, 4.98) 8.70 < .001
    Sepal.Length (2nd degree) -1.30 0.31 (-1.92, -0.68) -4.14 < .001
    Species (versicolor) -1.01 0.28 (-1.56, -0.46) -3.62 < .001
    Species (virginica) -0.99 0.29 (-1.55, -0.43) -3.48 < .001
    Petlen (4-7) -0.14 0.28 (-0.69, 0.42) -0.48 0.630

    A similar option is print_html(), which is a convenient wrapper for format_table() and export_table(format = "html"). Using HTML in markdown has the advantage that it will be properly rendered when exporting to PDF.

    model_parameters(model) %>% print_html()
    Regression Model
    Parameter Coefficient SE 95% CI t(144) p
    (Intercept) 3.81 0.06 (3.70, 3.93) 65.50 < .001
    Sepal.Length (1st degree) 4.06 0.47 (3.14, 4.98) 8.70 < .001
    Sepal.Length (2nd degree) -1.30 0.31 (-1.92, -0.68) -4.14 < .001
    Species (versicolor) -1.01 0.28 (-1.56, -0.46) -3.62 < .001
    Species (virginica) -0.99 0.29 (-1.55, -0.43) -3.48 < .001
    Petlen (4-7) -0.14 0.28 (-0.69, 0.42) -0.48 0.630

    print_md() and print_html() are considered as main functions for users who want to generate nicely rendered tables inside markdown-documents. A wrapper around these both is display(), which either calls print_md() or print_html().

    model_parameters(model) %>% display(format = "html")
    Regression Model
    Parameter Coefficient SE 95% CI t(144) p
    (Intercept) 3.81 0.06 (3.70, 3.93) 65.50 < .001
    Sepal.Length (1st degree) 4.06 0.47 (3.14, 4.98) 8.70 < .001
    Sepal.Length (2nd degree) -1.30 0.31 (-1.92, -0.68) -4.14 < .001
    Species (versicolor) -1.01 0.28 (-1.56, -0.46) -3.62 < .001
    Species (virginica) -0.99 0.29 (-1.55, -0.43) -3.48 < .001
    Petlen (4-7) -0.14 0.28 (-0.69, 0.42) -0.48 0.630

    Get Involved

    easystats is a new project in active development, looking for contributors and supporters. Thus, do not hesitate to contact us if you want to get involved 🙂

    • Check out our other blog posts here!

    Stay tuned

    To be updated about the upcoming features and cool R or data science stuff, you can follow the packages on GitHub (click on one of the easystats package) and then on the Watch button on the top right corner) as well as the easystats team on twitter and online:

    To leave a comment for the author, please follow the link and comment on their blog: R on easystats.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.