Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Convert the elements of a numerical vector or data frame column to character strings in which the numbers are formatted using powers-of-ten notation in scientific or engineering form and delimited for rendering as inline equations in an rmarkdown document.
Initial release of the formatdown
R package providing tools for formatting output in rmarkdown
or quarto
markdown documents.
This first version has one function only, format_power()
, for converting numbers to character strings formatted in powers-of-ten notation and delimited in $...$
for rendering as inline equations in .Rmd
or .qmd
output documents. Provides two powers-of-ten formatting options—scientific notation and engineering notation—with an option to omit powers-of-ten notation for a specified range of exponents.
To illustrate the different formats, I show in Table 1 the same number rendered using different formats, all with 4 significant digits.
The R code for the post is listed under the “R code” pointers. In the examples, I use data.table
syntax for data manipulation, though the code can be translated into base R
or dplyr
syntax if desired.
library("formatdown") library("data.table") x <- 4.567E-4 # value x1 <- format_power(x, 4, omit_power = c(-6, 0)) # omit power-of-ten x2 <- format_power(x, 4, format = "sci") # scientific x3 <- format_power(x, 4) # engineering # render in markdown table below
Notation | Name | Value | Rendered as |
---|---|---|---|
without |
x1 |
"$0.0004567$" |
|
scientific | x2 |
"$4.567\\times{10}^{-4}$" |
|
engineering | x3 |
"$456.7\\times{10}^{-6}$" |
Background
My first attempt to provide powers-of-ten formatting was in my 2016 package, docxtools
. That implementation has several shortcomings.
I wrote its formatting function to accept a data frame as input, which entailed a lot of programming overhead to separate numerical from non-numerical variable classes and to reassemble them after the numerical columns were formatted. This could have been simplified with judicious use of lapply()
, with which I was not sufficiently experienced at the time. I also failed to take advantage of formatC()
in constructing the output.
With formatdown
, my goal is to provide similar functionality but with more concise code, greater flexibility, and a more balanced approach to package dependencies.
Improvements
The primary design change is that the format_power()
function operates on a numerical vector instead of a data frame. The benefits of this change are: 1) simpler code that should be easier to revise and maintain; 2) scalar values can be formatted for rendering inline; and 3) data frames can still be formatted, by column, using lapply()
.
To illustrate formatting a scalar value inline, the markup for Avogadro’s number (x = 6.0221E+23
) in engineering format is given by,
$N_A =$ `r format_power(x, digits = 5, format = "engr")`
which is rendered (in this output document) as
The second improvement is the addition of an option for scientific notation. For example, the markup for Avogadro’s number in scientific notation is given by,
$N_A =$ `r format_power(x, digits = 5, format = "sci")`
which renders as
The third improvement is the addition of an option for omitting powers-of-ten notation over a range of exponents. For example, the markup for x = 1.23E-4
in decimal notation is given by,
$x =$ `r format_power(x = 1.234E-4, omit_power = c(-4, 0))`
which renders as
A final (internal) improvement is a more balanced approach to package dependencies. With a tighter focus on what formatdown
is to accomplish compared to docxtools
, I have reduced the dependencies to checkmate
, wrapr
, and data.table
.
The package vignette illustrates package usage in detail.
However, having successfully submitted the package to CRAN, I started working on this post and immediately (!) uncovered an issue that had not appeared while working on the package vignettes.
< section id="delimiter-issue" class="level2">Delimiter issue
I wrote the package vignette using the rmarkdown::html_vignette
output style per usual. All the formatted output rendered as expected in that document. I write this blog using quarto. As seen in the examples above, inline math is rendered as expected.
The issue arises when using knitr::kable()
and kableExtra::kbl()
to display data tables in this blog post. To illustrate, consider this data frame, included with formatdown
(ideal gas properties of air at room temperature).
density
date trial humidity T_K p_Pa R density <Date> <char> <fctr> <num> <num> <int> <num> 1: 2018-06-12 a low 294.05 101100 287 1.197976 2: 2018-06-13 b high 294.15 101000 287 1.196384 3: 2018-06-14 c medium 294.65 101100 287 1.195536 4: 2018-06-15 d low 293.35 101000 287 1.199647 5: 2018-06-16 e high 293.85 101100 287 1.198791
Formatting the pressure column, the markup looks OK.
DT <- copy(density) DT$p_Pa <- format_power(DT$p_Pa, 4) DT
date trial humidity T_K p_Pa R density <Date> <char> <fctr> <num> <char> <int> <num> 1: 2018-06-12 a low 294.05 $101.1\\times{10}^{3}$ 287 1.197976 2: 2018-06-13 b high 294.15 $101.0\\times{10}^{3}$ 287 1.196384 3: 2018-06-14 c medium 294.65 $101.1\\times{10}^{3}$ 287 1.195536 4: 2018-06-15 d low 293.35 $101.0\\times{10}^{3}$ 287 1.199647 5: 2018-06-16 e high 293.85 $101.1\\times{10}^{3}$ 287 1.198791
knitr::kable()
yields the expected output with pressure formatted in engineering notation.
knitr::kable(DT, align = "r")
date | trial | humidity | T_K | p_Pa | R | density |
---|---|---|---|---|---|---|
2018-06-12 | a | low | 294.05 | 287 | 1.197976 | |
2018-06-13 | b | high | 294.15 | 287 | 1.196384 | |
2018-06-14 | c | medium | 294.65 | 287 | 1.195536 | |
2018-06-15 | d | low | 293.35 | 287 | 1.199647 | |
2018-06-16 | e | high | 293.85 | 287 | 1.198791 |
Problem
kableExtra::kbl()
does not render the math markup as expected.
kableExtra::kbl(DT, align = "r")
date | trial | humidity | T_K | p_Pa | R | density |
---|---|---|---|---|---|---|
2018-06-12 | a | low | 294.05 | $101.1\times{10}^{3}$ | 287 | 1.197976 |
2018-06-13 | b | high | 294.15 | $101.0\times{10}^{3}$ | 287 | 1.196384 |
2018-06-14 | c | medium | 294.65 | $101.1\times{10}^{3}$ | 287 | 1.195536 |
2018-06-15 | d | low | 293.35 | $101.0\times{10}^{3}$ | 287 | 1.199647 |
2018-06-16 | e | high | 293.85 | $101.1\times{10}^{3}$ | 287 | 1.198791 |
In fact, having loaded kableExtra
above, knitr::kable()
now fails in the same way.
knitr::kable(DT, align = "r")
date | trial | humidity | T_K | p_Pa | R | density |
---|---|---|---|---|---|---|
2018-06-12 | a | low | 294.05 | $101.1\times{10}^{3}$ | 287 | 1.197976 |
2018-06-13 | b | high | 294.15 | $101.0\times{10}^{3}$ | 287 | 1.196384 |
2018-06-14 | c | medium | 294.65 | $101.1\times{10}^{3}$ | 287 | 1.195536 |
2018-06-15 | d | low | 293.35 | $101.0\times{10}^{3}$ | 287 | 1.199647 |
2018-06-16 | e | high | 293.85 | $101.1\times{10}^{3}$ | 287 | 1.198791 |
Solution
I found a suggestion from MathJax to replace the $ ... $
delimiters with \\( ... \\)
. I wrote a short function (below) to do that.
# Substitute math delimiters sub_delim <- function(x) { x <- sub("\\$", "\\\\(", x) # first $ x <- sub("\\$", "\\\\)", x) # second $ } DT$p_Pa <- sub_delim(DT$p_Pa) DT
date trial humidity T_K p_Pa R density <Date> <char> <fctr> <num> <char> <int> <num> 1: 2018-06-12 a low 294.05 \\(101.1\\times{10}^{3}\\) 287 1.197976 2: 2018-06-13 b high 294.15 \\(101.0\\times{10}^{3}\\) 287 1.196384 3: 2018-06-14 c medium 294.65 \\(101.1\\times{10}^{3}\\) 287 1.195536 4: 2018-06-15 d low 293.35 \\(101.0\\times{10}^{3}\\) 287 1.199647 5: 2018-06-16 e high 293.85 \\(101.1\\times{10}^{3}\\) 287 1.198791
knitr::kable()
yields the expected output.
knitr::kable(DT, align = "c")
date | trial | humidity | T_K | p_Pa | R | density |
---|---|---|---|---|---|---|
2018-06-12 | a | low | 294.05 | \(101.1\times{10}^{3}\) | 287 | 1.197976 |
2018-06-13 | b | high | 294.15 | \(101.0\times{10}^{3}\) | 287 | 1.196384 |
2018-06-14 | c | medium | 294.65 | \(101.1\times{10}^{3}\) | 287 | 1.195536 |
2018-06-15 | d | low | 293.35 | \(101.0\times{10}^{3}\) | 287 | 1.199647 |
2018-06-16 | e | high | 293.85 | \(101.1\times{10}^{3}\) | 287 | 1.198791 |
kableExtra::kbl()
yields the expected output.
kableExtra::kbl(DT, align = "c")
date | trial | humidity | T_K | p_Pa | R | density |
---|---|---|---|---|---|---|
2018-06-12 | a | low | 294.05 | \(101.1\times{10}^{3}\) | 287 | 1.197976 |
2018-06-13 | b | high | 294.15 | \(101.0\times{10}^{3}\) | 287 | 1.196384 |
2018-06-14 | c | medium | 294.65 | \(101.1\times{10}^{3}\) | 287 | 1.195536 |
2018-06-15 | d | low | 293.35 | \(101.0\times{10}^{3}\) | 287 | 1.199647 |
2018-06-16 | e | high | 293.85 | \(101.1\times{10}^{3}\) | 287 | 1.198791 |
I can use the features from kableExtra
to print a pretty table.
library("kableExtra") var_names <- c("Date", "Trial", "Humidity", "Temperature", "Pressure", "Gas constant", "Density" ) var_units <- c("", "", "", "[K]", "[Pa]", "[J/(kg K)]", "[kg/m\\(^3\\)]") var_align <- "r" DT |> kbl(align = var_align, col.names = var_units) |> column_spec(1:6, color = "black", background = "white") |> add_header_above(header = var_names, align = var_align, background = "#c7eae5", line_sep = 0) |> kable_paper(lightable_options = "basic", full_width = TRUE)
Date |
Trial |
Humidity |
Temperature |
Pressure |
Gas constant |
Density |
---|---|---|---|---|---|---|
[K] | [Pa] | [J/(kg K)] | [kg/m\(^3\)] | |||
2018-06-12 | a | low | 294.05 | \(101.1\times{10}^{3}\) | 287 | 1.197976 |
2018-06-13 | b | high | 294.15 | \(101.0\times{10}^{3}\) | 287 | 1.196384 |
2018-06-14 | c | medium | 294.65 | \(101.1\times{10}^{3}\) | 287 | 1.195536 |
2018-06-15 | d | low | 293.35 | \(101.0\times{10}^{3}\) | 287 | 1.199647 |
2018-06-16 | e | high | 293.85 | \(101.1\times{10}^{3}\) | 287 | 1.198791 |
Follow up
To address this issue, the next version of format_power()
will include a new delim
argument,
format_power(x, digits, format, omit_power, delim)
that allows a user to set the math delimiters to $ ... $
or \\( ... \\)
or even custom left and right markup to suit their environment.
Fixed exponents
Preparing this post, I adapted a table of water properties from the hydraulics package to use as an example and discovered another, more subtle issue. First, I’ll construct the data frame.
# Construct a table of water properties temperature <- seq(0, 45, 10) + 273.15 density <- c(1000, 1000, 998, 996, 992) specific_weight <- c(9809, 9807, 9793, 9768, 9734) viscosity <- c(173, 131, 102, 81.7, 67.0) * 1E-8 bulk_modulus <- c(202, 210, 218, 225, 228) * 1E+7 water <- data.table(temperature, density, specific_weight, viscosity, bulk_modulus) water
temperature density specific_weight viscosity bulk_modulus <num> <num> <num> <num> <num> 1: 273.15 1000 9809 1.73e-06 2.02e+09 2: 283.15 1000 9807 1.31e-06 2.10e+09 3: 293.15 998 9793 1.02e-06 2.18e+09 4: 303.15 996 9768 8.17e-07 2.25e+09 5: 313.15 992 9734 6.70e-07 2.28e+09
Problem
I format all the columns and change the delimiters as described earlier and display the result. The viscosity column reveals the problem.
DT <- copy(water) # 5 signif digits cols_to_format <- c("temperature") DT[, (cols_to_format) := lapply(.SD, function(x) format_power(x, 5)), .SDcols = cols_to_format] # 4 signif digits cols_to_format <- c("specific_weight") DT[, (cols_to_format) := lapply(.SD, function(x) format_power(x, 4)), .SDcols = cols_to_format] # 3 signif digits cols_to_format <- c("viscosity", "bulk_modulus") DT[, (cols_to_format) := lapply(.SD, function(x) format_power(x)), .SDcols = cols_to_format] # 3 signif digits omit powers cols_to_format <- c("density") DT[, (cols_to_format) := lapply(.SD, function(x) format_power(x, omit_power = c(0, 3))), .SDcols = cols_to_format] # change the delimiters DT <- DT[, lapply(.SD, function(x) sub_delim(x))] # Table DT |> kbl(align = "cclrrrr") |> kable_paper(lightable_options = "basic", full_width = TRUE) |> row_spec(0, background = "#c7eae5") |> column_spec(1:5, color = "black", background = "white")
temperature | density | specific_weight | viscosity | bulk_modulus |
---|---|---|---|---|
\(273.15\) | \(1000\) | \(9.809\times{10}^{3}\) | \(1.73\times{10}^{-6}\) | \(2.02\times{10}^{9}\) |
\(283.15\) | \(1000\) | \(9.807\times{10}^{3}\) | \(1.31\times{10}^{-6}\) | \(2.10\times{10}^{9}\) |
\(293.15\) | \(998\) | \(9.793\times{10}^{3}\) | \(1.02\times{10}^{-6}\) | \(2.18\times{10}^{9}\) |
\(303.15\) | \(996\) | \(9.768\times{10}^{3}\) | \(817\times{10}^{-9}\) | \(2.25\times{10}^{9}\) |
\(313.15\) | \(992\) | \(9.734\times{10}^{3}\) | \(670\times{10}^{-9}\) | \(2.28\times{10}^{9}\) |
The viscosity column displays three values using
# Manually edit strings to illustrate DT$viscosity[4] <- "\\(0.82\\times{10}^{-6}\\)" DT$viscosity[5] <- "\\(0.67\\times{10}^{-6}\\)" # Table DT |> kbl(align = "cclrrrr") |> kable_paper(lightable_options = "basic", full_width = TRUE) |> row_spec(0, background = "#c7eae5") |> column_spec(1:5, color = "black", background = "white")
temperature | density | specific_weight | viscosity | bulk_modulus |
---|---|---|---|---|
\(273.15\) | \(1000\) | \(9.809\times{10}^{3}\) | \(1.73\times{10}^{-6}\) | \(2.02\times{10}^{9}\) |
\(283.15\) | \(1000\) | \(9.807\times{10}^{3}\) | \(1.31\times{10}^{-6}\) | \(2.10\times{10}^{9}\) |
\(293.15\) | \(998\) | \(9.793\times{10}^{3}\) | \(1.02\times{10}^{-6}\) | \(2.18\times{10}^{9}\) |
\(303.15\) | \(996\) | \(9.768\times{10}^{3}\) | \(0.82\times{10}^{-6}\) | \(2.25\times{10}^{9}\) |
\(313.15\) | \(992\) | \(9.734\times{10}^{3}\) | \(0.67\times{10}^{-6}\) | \(2.28\times{10}^{9}\) |
This revision satisfies two conventions of tabulating empirical engineering information.
Units. With all the reported values reported to the same power-of-ten, the units can all be interpreted in the same way. In this case for example, the units of the viscosity coefficients (1.73, 1.31, etc.) are all micro-Pascal-seconds (
Pa-s).Uncertainty. In rewriting the two viscosity values, I changed from three significant digits to two decimal places, consistent with the assumption that empirical information is reported to the same level of uncertainty unless noted otherwise.
Potential revision
Add the water
data to formatdown
and the following functionality to format_power()
.
A new argument (perhaps
fixed_power
) that automatically selects a fixed exponent for a numerical vector or permits the user to directly assign a fixed exponent.format_power(x, digits, format, omit_power, delim, fixed_power)
In conjunction with the fixed power-of-ten, I would also round all numbers in a column to the same number of decimal places to address the uncertainty assumption. This could be a separate argument.
Units
And now for something completely different!
Thinking about measurement units, I looked for relevant R packages and found units. With appropriate units, powers-of-ten notation can be practically eliminated. For example, a pressure reading of
To illustrate, I start with the basic water data,
water
temperature density specific_weight viscosity bulk_modulus <num> <num> <num> <num> <num> 1: 273.15 1000 9809 1.73e-06 2.02e+09 2: 283.15 1000 9807 1.31e-06 2.10e+09 3: 293.15 998 9793 1.02e-06 2.18e+09 4: 303.15 996 9768 8.17e-07 2.25e+09 5: 313.15 992 9734 6.70e-07 2.28e+09
With tools from the units
package, I can define a symbol uP
to represent micropoise (a non-SI viscosity unit equal to 10
library("units") # Define the uP units install_unit("uP", "micropoise", "micropoise") # Function to assign and convert units assign_units <- function(x, base_unit, display_unit) { # convert x to "Units" class in base units units(x) <- base_unit # convert from basic to display units units(x) <- as_units(display_unit) # return value x }
Convert each column and output the results.
# Apply to one variable at a time DT <- copy(water) DT$temperature <- assign_units(DT$temperature, "K", "degree_C") DT$density <- assign_units(DT$density, "kg/m^3", "kg/m^3") DT$specific_weight <- assign_units(DT$specific_weight, "N/m^3", "kN/m^3") DT$viscosity <- assign_units(DT$viscosity, "Pa*s", "uP") DT$bulk_modulus <- assign_units(DT$bulk_modulus, "Pa", "GPa") # Output DT |> kbl(align = "r") |> kable_paper(lightable_options = "basic", full_width = TRUE) |> row_spec(0, background = "#c7eae5") |> column_spec(1:5, color = "black", background = "white")
temperature | density | specific_weight | viscosity | bulk_modulus |
---|---|---|---|---|
0 [°C] | 1000 [kg/m^3] | 9.809 [kN/m^3] | 17.30 [uP] | 2.02 [GPa] |
10 [°C] | 1000 [kg/m^3] | 9.807 [kN/m^3] | 13.10 [uP] | 2.10 [GPa] |
20 [°C] | 998 [kg/m^3] | 9.793 [kN/m^3] | 10.20 [uP] | 2.18 [GPa] |
30 [°C] | 996 [kg/m^3] | 9.768 [kN/m^3] | 8.17 [uP] | 2.25 [GPa] |
40 [°C] | 992 [kg/m^3] | 9.734 [kN/m^3] | 6.70 [uP] | 2.28 [GPa] |
The entries in the data frame are still numeric but are of the “Units” class, enabling math operations among values with compatible units. See the units website for details.
str(DT)
Classes 'data.table' and 'data.frame': 5 obs. of 5 variables: $ temperature : Units: [°C] num 0 10 20 30 40 $ density : Units: [kg/m^3] num 1000 1000 998 996 992 $ specific_weight: Units: [kN/m^3] num 9.81 9.81 9.79 9.77 9.73 $ viscosity : Units: [uP] num 17.3 13.1 10.2 8.17 6.7 $ bulk_modulus : Units: [GPa] num 2.02 2.1 2.18 2.25 2.28 - attr(*, ".internal.selfref")=<externalptr>
If I were to refine this table further, I would report the numerical values without labels in each cell, moving the unit labels to a sub-header row. Possible future work.
< section id="potential-revision-1" class="level3">Potential revision
Incorporate tools from the units
package to create a new function (perhaps format_units()
) that would convert basic units to display units that can substitute for powers-of-ten notation.
Closing
The new formatdown
package formats numbers in powers-of-ten notation for inline math markup. A new argument is already in the works for managing the math delimiters. Potential new features include a fixed power-of-tens option as well as replacing powers-of-ten notation with deliberate manipulation of physical units.
Additional software credits
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.