Introducing the htmlTable-package

Posted on April 22, 2015 by Max Gordon in R bloggers | 0 Comments

[This article was first published on G-Forge » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

How should we convey complex data? The image is is CC by Sacha Fernandez.

My htmlTable-function has perhaps been one of my most successful projects. I developed it in order to get tables matching those available in top medical journals. As the function has grown I’ve decided to separate it from my Gmisc-package into a separate package, and at the time of writing this I’ve just released the 1.3 version. While htmlTable allows for creating plain tables without any fancy formatting (see usage vignette) it is primarily aimed at complex tables. In this post I’ll try to show you what you can do and how to tame some of the more advanced features.

Objective: visualize migration patterns between Swedish counties the last 15 years

In this example I will try to convey a table with 240 values without overwhelming the reader. The dataset is from Statistics Sweden (downloaded using pxweb) and comes with the htmlTable-package. Our first job is to reshape our tidy dataset into a more table viewing friendly format.

^?View Code RSPLUS

library(htmlTable)
data(SCB)
 
# The vignette includes the Uppsala county but this generates a 
# too wide table for the blog and we therefore need to drop these
SCB <- subset(SCB, region != "Uppsala county")
 
# The SCB has three other coulmns and one value column
library(reshape)
SCB$region <- relevel(SCB$region, "Sweden")
SCB <- cast(SCB, year ~ region + sex, value = "values")
 
# Set rownames to be year
rownames(SCB) <- SCB$year
SCB$year <- NULL

The next step is to calculate two new columns:

Δ_int = The change within each group since the start of the observation.
Δ_std = The change in relation to the overall age change in Sweden.

To separete these layers of information we use stacked column spanners:

County
Men			Women
Age	Δ_int.	Δ_ext.	Age	Δ_int.	Δ_ext.

These are created through using cgroup with multiple rows:

^?View Code RSPLUS

mx <- NULL
for (n in names(SCB)){
  tmp <- paste0("Sweden_", strsplit(n, "_")[[1]][2])
  mx <- cbind(mx,
              cbind(SCB[[n]], 
                    SCB[[n]] - SCB[[n]][1],
                    SCB[[n]] - SCB[[tmp]]))
}
rownames(mx) <- rownames(SCB)
colnames(mx) <- rep(c("Age", 
                      "Δ<sub>int</sub>",
                      "Δ<sub>std</sub>"), 
                    times = ncol(SCB))
mx <- mx[,c(-3, -6)]
 
# This automated generation of cgroup elements is 
# somewhat of an overkill
cgroup <- 
  unique(sapply(names(SCB), 
                function(x) strsplit(x, "_")[[1]][1], 
                USE.NAMES = FALSE))
n.cgroup <- 
  sapply(cgroup, 
         function(x) sum(grepl(paste0("^", x), names(SCB))), 
         USE.NAMES = FALSE)*3
n.cgroup[cgroup == "Sweden"] <-
  n.cgroup[cgroup == "Sweden"] - 2
 
cgroup <- 
  rbind(c(cgroup, rep(NA, ncol(SCB) - length(cgroup))),
        Hmisc::capitalize(
          sapply(names(SCB), 
                 function(x) strsplit(x, "_")[[1]][2],
                 USE.NAMES = FALSE)))
n.cgroup <- 
  rbind(c(n.cgroup, rep(NA, ncol(SCB) - length(n.cgroup))),
        c(2,2, rep(3, ncol(cgroup) - 2)))
 
print(cgroup)

##      [,1]     [,2]                [,3]               [,4]    [,5]  [,6]   
## [1,] "Sweden" "Norrbotten county" "Stockholm county" NA      NA    NA     
## [2,] "Men"    "Women"             "Men"              "Women" "Men" "Women"

^?View Code RSPLUS

1	print(n.cgroup)

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    4    6    6   NA   NA   NA
## [2,]    2    2    3    3    3    3

Next step is to output the table after rounding to the correct number of decimals. The txtRound function helps with this, as it uses the sprintf function instead of the round the resulting strings have the correct number of decimals, i.e. 1.02 will by round become 1, in text we generally want to retain the last decimal, i.e. 1.02 be displayed as 1.0.

^?View Code RSPLUS

htmlTable(txtRound(mx, 1), 
          cgroup = cgroup,
          n.cgroup = n.cgroup,
          rgroup = c("First period", 
                     "Second period",
                     "Third period"),
          n.rgroup = rep(5, 3),
          tfoot = txtMergeLines("Δ<sub>int</sub> correspnds to the change since start",
                                "Δ<sub>std</sub> corresponds to the change compared to national average"))

	Sweden				Norrbotten county						Stockholm county
	Men		Women		Men			Women			Men			Women
	Age	Δ_int	Age	Δ_int	Age	Δ_int	Δ_std	Age	Δ_int	Δ_std	Age	Δ_int	Δ_std	Age	Δ_int	Δ_std
First period
1999	38.9	0.0	41.5	0.0	39.7	0.0	0.8	41.9	0.0	0.4	37.3	0.0	-1.6	40.1	0.0	-1.4
2000	39.0	0.1	41.6	0.1	40.0	0.3	1.0	42.2	0.3	0.6	37.4	0.1	-1.6	40.1	0.0	-1.5
2001	39.2	0.3	41.7	0.2	40.2	0.5	1.0	42.5	0.6	0.8	37.5	0.2	-1.7	40.1	0.0	-1.6
2002	39.3	0.4	41.8	0.3	40.5	0.8	1.2	42.8	0.9	1.0	37.6	0.3	-1.7	40.2	0.1	-1.6
2003	39.4	0.5	41.9	0.4	40.7	1.0	1.3	43.0	1.1	1.1	37.7	0.4	-1.7	40.2	0.1	-1.7
Second period
2004	39.6	0.7	42.0	0.5	40.9	1.2	1.3	43.1	1.2	1.1	37.8	0.5	-1.8	40.3	0.2	-1.7
2005	39.7	0.8	42.0	0.5	41.1	1.4	1.4	43.4	1.5	1.4	37.9	0.6	-1.8	40.3	0.2	-1.7
2006	39.8	0.9	42.1	0.6	41.3	1.6	1.5	43.5	1.6	1.4	37.9	0.6	-1.9	40.2	0.1	-1.9
2007	39.8	0.9	42.1	0.6	41.5	1.8	1.7	43.8	1.9	1.7	37.8	0.5	-2.0	40.1	0.0	-2.0
2008	39.9	1.0	42.1	0.6	41.7	2.0	1.8	44.0	2.1	1.9	37.8	0.5	-2.1	40.1	0.0	-2.0
Third period
2009	39.9	1.0	42.1	0.6	41.9	2.2	2.0	44.2	2.3	2.1	37.8	0.5	-2.1	40.0	-0.1	-2.1
2010	40.0	1.1	42.1	0.6	42.1	2.4	2.1	44.4	2.5	2.3	37.8	0.5	-2.2	40.0	-0.1	-2.1
2011	40.1	1.2	42.2	0.7	42.3	2.6	2.2	44.5	2.6	2.3	37.9	0.6	-2.2	39.9	-0.2	-2.3
2012	40.2	1.3	42.2	0.7	42.4	2.7	2.2	44.6	2.7	2.4	37.9	0.6	-2.3	39.9	-0.2	-2.3
2013	40.2	1.3	42.2	0.7	42.4	2.7	2.2	44.7	2.8	2.5	38.0	0.7	-2.2	39.9	-0.2	-2.3
Δ_int correspnds to the change since start Δ_std corresponds to the change compared to national average

In order to increase the readability we may want to separate the Sweden columns from the county columns, one way is to use the align option with a |. Note that in 1.0 the function continues with the same alignment until the end, i.e. you no longer need count to have the exact right number of columns in your alignment argument.

^?View Code RSPLUS

htmlTable(txtRound(mx, 1), 
          align="rrrr|r",
          cgroup = cgroup,
          n.cgroup = n.cgroup,
          rgroup = c("First period", 
                     "Second period",
                     "Third period"),
          n.rgroup = rep(5, 3),
          tfoot = txtMergeLines("Δ<sub>int</sub> correspnds to the change since start",
                                "Δ<sub>std</sub> corresponds to the change compared to national average"))

	Sweden				Norrbotten county						Stockholm county
	Men		Women		Men			Women			Men			Women
	Age	Δ_int	Age	Δ_int	Age	Δ_int	Δ_std	Age	Δ_int	Δ_std	Age	Δ_int	Δ_std	Age	Δ_int	Δ_std
First period
1999	38.9	0.0	41.5	0.0	39.7	0.0	0.8	41.9	0.0	0.4	37.3	0.0	-1.6	40.1	0.0	-1.4
2000	39.0	0.1	41.6	0.1	40.0	0.3	1.0	42.2	0.3	0.6	37.4	0.1	-1.6	40.1	0.0	-1.5
2001	39.2	0.3	41.7	0.2	40.2	0.5	1.0	42.5	0.6	0.8	37.5	0.2	-1.7	40.1	0.0	-1.6
2002	39.3	0.4	41.8	0.3	40.5	0.8	1.2	42.8	0.9	1.0	37.6	0.3	-1.7	40.2	0.1	-1.6
2003	39.4	0.5	41.9	0.4	40.7	1.0	1.3	43.0	1.1	1.1	37.7	0.4	-1.7	40.2	0.1	-1.7
Second period
2004	39.6	0.7	42.0	0.5	40.9	1.2	1.3	43.1	1.2	1.1	37.8	0.5	-1.8	40.3	0.2	-1.7
2005	39.7	0.8	42.0	0.5	41.1	1.4	1.4	43.4	1.5	1.4	37.9	0.6	-1.8	40.3	0.2	-1.7
2006	39.8	0.9	42.1	0.6	41.3	1.6	1.5	43.5	1.6	1.4	37.9	0.6	-1.9	40.2	0.1	-1.9
2007	39.8	0.9	42.1	0.6	41.5	1.8	1.7	43.8	1.9	1.7	37.8	0.5	-2.0	40.1	0.0	-2.0
2008	39.9	1.0	42.1	0.6	41.7	2.0	1.8	44.0	2.1	1.9	37.8	0.5	-2.1	40.1	0.0	-2.0
Third period
2009	39.9	1.0	42.1	0.6	41.9	2.2	2.0	44.2	2.3	2.1	37.8	0.5	-2.1	40.0	-0.1	-2.1
2010	40.0	1.1	42.1	0.6	42.1	2.4	2.1	44.4	2.5	2.3	37.8	0.5	-2.2	40.0	-0.1	-2.1
2011	40.1	1.2	42.2	0.7	42.3	2.6	2.2	44.5	2.6	2.3	37.9	0.6	-2.2	39.9	-0.2	-2.3
2012	40.2	1.3	42.2	0.7	42.4	2.7	2.2	44.6	2.7	2.4	37.9	0.6	-2.3	39.9	-0.2	-2.3
2013	40.2	1.3	42.2	0.7	42.4	2.7	2.2	44.7	2.8	2.5	38.0	0.7	-2.2	39.9	-0.2	-2.3
Δ_int correspnds to the change since start Δ_std corresponds to the change compared to national average

If we still feel that we want more separation it is always possible to add colors.

^?View Code RSPLUS

htmlTable(txtRound(mx, 1), 
          col.columns = c(rep("#E6E6F0", 4),
                          rep("none", ncol(mx) - 4)),
          align="rrrr|r",
          cgroup = cgroup,
          n.cgroup = n.cgroup,
          rgroup = c("First period", 
                     "Second period",
                     "Third period"),
          n.rgroup = rep(5, 3),
                    tfoot = txtMergeLines("Δ<sub>int</sub> correspnds to the change since start",
                                "Δ<sub>std</sub> corresponds to the change compared to national average"))

	Sweden				Norrbotten county						Stockholm county
	Men		Women		Men			Women			Men			Women
	Age	Δ_int	Age	Δ_int	Age	Δ_int	Δ_std	Age	Δ_int	Δ_std	Age	Δ_int	Δ_std	Age	Δ_int	Δ_std
First period
1999	38.9	0.0	41.5	0.0	39.7	0.0	0.8	41.9	0.0	0.4	37.3	0.0	-1.6	40.1	0.0	-1.4
2000	39.0	0.1	41.6	0.1	40.0	0.3	1.0	42.2	0.3	0.6	37.4	0.1	-1.6	40.1	0.0	-1.5
2001	39.2	0.3	41.7	0.2	40.2	0.5	1.0	42.5	0.6	0.8	37.5	0.2	-1.7	40.1	0.0	-1.6
2002	39.3	0.4	41.8	0.3	40.5	0.8	1.2	42.8	0.9	1.0	37.6	0.3	-1.7	40.2	0.1	-1.6
2003	39.4	0.5	41.9	0.4	40.7	1.0	1.3	43.0	1.1	1.1	37.7	0.4	-1.7	40.2	0.1	-1.7
Second period
2004	39.6	0.7	42.0	0.5	40.9	1.2	1.3	43.1	1.2	1.1	37.8	0.5	-1.8	40.3	0.2	-1.7
2005	39.7	0.8	42.0	0.5	41.1	1.4	1.4	43.4	1.5	1.4	37.9	0.6	-1.8	40.3	0.2	-1.7
2006	39.8	0.9	42.1	0.6	41.3	1.6	1.5	43.5	1.6	1.4	37.9	0.6	-1.9	40.2	0.1	-1.9
2007	39.8	0.9	42.1	0.6	41.5	1.8	1.7	43.8	1.9	1.7	37.8	0.5	-2.0	40.1	0.0	-2.0
2008	39.9	1.0	42.1	0.6	41.7	2.0	1.8	44.0	2.1	1.9	37.8	0.5	-2.1	40.1	0.0	-2.0
Third period
2009	39.9	1.0	42.1	0.6	41.9	2.2	2.0	44.2	2.3	2.1	37.8	0.5	-2.1	40.0	-0.1	-2.1
2010	40.0	1.1	42.1	0.6	42.1	2.4	2.1	44.4	2.5	2.3	37.8	0.5	-2.2	40.0	-0.1	-2.1
2011	40.1	1.2	42.2	0.7	42.3	2.6	2.2	44.5	2.6	2.3	37.9	0.6	-2.2	39.9	-0.2	-2.3
2012	40.2	1.3	42.2	0.7	42.4	2.7	2.2	44.6	2.7	2.4	37.9	0.6	-2.3	39.9	-0.2	-2.3
2013	40.2	1.3	42.2	0.7	42.4	2.7	2.2	44.7	2.8	2.5	38.0	0.7	-2.2	39.9	-0.2	-2.3
Δ_int correspnds to the change since start Δ_std corresponds to the change compared to national average

If we add a color to the row group and restrict the rgroup spanner we may even have a more visual aid.

^?View Code RSPLUS

htmlTable(txtRound(mx, 1),
          col.rgroup = c("none", "#FFFFCC"),
          col.columns = c(rep("#EFEFF0", 4),
                          rep("none", ncol(mx) - 4)),
          align="rrrr|r",
          cgroup = cgroup,
          n.cgroup = n.cgroup,
          # I use the   - the no breaking space as I don't want to have a
          # row break in the row group. This adds a little space in the table
          # when used together with the cspan.rgroup=1.
          rgroup = c("1st period", 
                     "2nd period",
                     "3rd period"),
          n.rgroup = rep(5, 3),
          tfoot = txtMergeLines("Δ<sub>int</sub> correspnds to the change since start",
                                "Δ<sub>std</sub> corresponds to the change compared to national average"),
          cspan.rgroup = 1)

	Sweden				Norrbotten county						Stockholm county
	Men		Women		Men			Women			Men			Women
	Age	Δ_int	Age	Δ_int	Age	Δ_int	Δ_std	Age	Δ_int	Δ_std	Age	Δ_int	Δ_std	Age	Δ_int	Δ_std
1st period
1999	38.9	0.0	41.5	0.0	39.7	0.0	0.8	41.9	0.0	0.4	37.3	0.0	-1.6	40.1	0.0	-1.4
2000	39.0	0.1	41.6	0.1	40.0	0.3	1.0	42.2	0.3	0.6	37.4	0.1	-1.6	40.1	0.0	-1.5
2001	39.2	0.3	41.7	0.2	40.2	0.5	1.0	42.5	0.6	0.8	37.5	0.2	-1.7	40.1	0.0	-1.6
2002	39.3	0.4	41.8	0.3	40.5	0.8	1.2	42.8	0.9	1.0	37.6	0.3	-1.7	40.2	0.1	-1.6
2003	39.4	0.5	41.9	0.4	40.7	1.0	1.3	43.0	1.1	1.1	37.7	0.4	-1.7	40.2	0.1	-1.7
2nd period
2004	39.6	0.7	42.0	0.5	40.9	1.2	1.3	43.1	1.2	1.1	37.8	0.5	-1.8	40.3	0.2	-1.7
2005	39.7	0.8	42.0	0.5	41.1	1.4	1.4	43.4	1.5	1.4	37.9	0.6	-1.8	40.3	0.2	-1.7
2006	39.8	0.9	42.1	0.6	41.3	1.6	1.5	43.5	1.6	1.4	37.9	0.6	-1.9	40.2	0.1	-1.9
2007	39.8	0.9	42.1	0.6	41.5	1.8	1.7	43.8	1.9	1.7	37.8	0.5	-2.0	40.1	0.0	-2.0
2008	39.9	1.0	42.1	0.6	41.7	2.0	1.8	44.0	2.1	1.9	37.8	0.5	-2.1	40.1	0.0	-2.0
3rd period
2009	39.9	1.0	42.1	0.6	41.9	2.2	2.0	44.2	2.3	2.1	37.8	0.5	-2.1	40.0	-0.1	-2.1
2010	40.0	1.1	42.1	0.6	42.1	2.4	2.1	44.4	2.5	2.3	37.8	0.5	-2.2	40.0	-0.1	-2.1
2011	40.1	1.2	42.2	0.7	42.3	2.6	2.2	44.5	2.6	2.3	37.9	0.6	-2.2	39.9	-0.2	-2.3
2012	40.2	1.3	42.2	0.7	42.4	2.7	2.2	44.6	2.7	2.4	37.9	0.6	-2.3	39.9	-0.2	-2.3
2013	40.2	1.3	42.2	0.7	42.4	2.7	2.2	44.7	2.8	2.5	38.0	0.7	-2.2	39.9	-0.2	-2.3
Δ_int correspnds to the change since start Δ_std corresponds to the change compared to national average

If you want to further add to the visual hints you can use specific HTML-code and insert it into the cells. Here we will color the Δ_std according to color.

^?View Code RSPLUS

cols_2_clr <- grep("Δ<sub>std</sub>", colnames(mx))
# We need a copy as the formatting causes the matrix to loos
# its numerical property
out_mx <- txtRound(mx, 1)
 
min_delta <- min(mx[,cols_2_clr])
span_delta <- max(mx[,cols_2_clr]) - min(mx[,cols_2_clr]) 
for (col in cols_2_clr){
  out_mx[, col] <- mapply(function(val, strength)
    paste0("<span style='font-weight: 900; color: ", 
           colorRampPalette(c("#009900", "#000000", "#990033"))(101)[strength],
           "'>",
           val, "</span>"), 
    val = out_mx[,col], 
    strength = round((mx[,col] - min_delta)/span_delta*100 + 1),
    USE.NAMES = FALSE)
}
 
htmlTable(out_mx,
          caption = "Average age in Sweden counties over a period of
                     15 years. The Norbotten county is typically known
                     for having a negative migration pattern compared to
                     Stockholm.",
          pos.rowlabel = "bottom",
          rowlabel="Year", 
          col.rgroup = c("none", "#FFFFCC"),
          col.columns = c(rep("#EFEFF0", 4),
                          rep("none", ncol(mx) - 4)),
          align="rrrr|r",
          cgroup = cgroup,
          n.cgroup = n.cgroup,
          rgroup = c("1st period", 
                     "2nd period",
                     "3rd period"),
          n.rgroup = rep(5, 3),
          tfoot = txtMergeLines("Δ<sub>int</sub> correspnds to the change since start",
                                "Δ<sub>std</sub> corresponds to the change compared to national average"),
          cspan.rgroup = 1)

	Sweden				Norrbotten county						Stockholm county
Average age in Sweden counties over a period of 15 years. The Norbotten county is typically known for having a negative migration pattern compared to Stockholm.
	Men		Women		Men			Women			Men			Women
Year	Age	Δ_int	Age	Δ_int	Age	Δ_int	Δ_std	Age	Δ_int	Δ_std	Age	Δ_int	Δ_std	Age	Δ_int	Δ_std
1st period
1999	38.9	0.0	41.5	0.0	39.7	0.0	0.8	41.9	0.0	0.4	37.3	0.0	-1.6	40.1	0.0	-1.4
2000	39.0	0.1	41.6	0.1	40.0	0.3	1.0	42.2	0.3	0.6	37.4	0.1	-1.6	40.1	0.0	-1.5
2001	39.2	0.3	41.7	0.2	40.2	0.5	1.0	42.5	0.6	0.8	37.5	0.2	-1.7	40.1	0.0	-1.6
2002	39.3	0.4	41.8	0.3	40.5	0.8	1.2	42.8	0.9	1.0	37.6	0.3	-1.7	40.2	0.1	-1.6
2003	39.4	0.5	41.9	0.4	40.7	1.0	1.3	43.0	1.1	1.1	37.7	0.4	-1.7	40.2	0.1	-1.7
2nd period
2004	39.6	0.7	42.0	0.5	40.9	1.2	1.3	43.1	1.2	1.1	37.8	0.5	-1.8	40.3	0.2	-1.7
2005	39.7	0.8	42.0	0.5	41.1	1.4	1.4	43.4	1.5	1.4	37.9	0.6	-1.8	40.3	0.2	-1.7
2006	39.8	0.9	42.1	0.6	41.3	1.6	1.5	43.5	1.6	1.4	37.9	0.6	-1.9	40.2	0.1	-1.9
2007	39.8	0.9	42.1	0.6	41.5	1.8	1.7	43.8	1.9	1.7	37.8	0.5	-2.0	40.1	0.0	-2.0
2008	39.9	1.0	42.1	0.6	41.7	2.0	1.8	44.0	2.1	1.9	37.8	0.5	-2.1	40.1	0.0	-2.0
3rd period
2009	39.9	1.0	42.1	0.6	41.9	2.2	2.0	44.2	2.3	2.1	37.8	0.5	-2.1	40.0	-0.1	-2.1
2010	40.0	1.1	42.1	0.6	42.1	2.4	2.1	44.4	2.5	2.3	37.8	0.5	-2.2	40.0	-0.1	-2.1
2011	40.1	1.2	42.2	0.7	42.3	2.6	2.2	44.5	2.6	2.3	37.9	0.6	-2.2	39.9	-0.2	-2.3
2012	40.2	1.3	42.2	0.7	42.4	2.7	2.2	44.6	2.7	2.4	37.9	0.6	-2.3	39.9	-0.2	-2.3
2013	40.2	1.3	42.2	0.7	42.4	2.7	2.2	44.7	2.8	2.5	38.0	0.7	-2.2	39.9	-0.2	-2.3
Δ_int correspnds to the change since start Δ_std corresponds to the change compared to national average

Although a graph most likely does the visualization task better, tables are good at conveying detailed information. It is in my mind without doubt easier in the last table to find the pattern in the data.

Lastly I would like to thank Frank Harrel for the Hmisc::latex function that inspired me to start this. Also important sources of inspirations have been Stephen Few, ThinkUI, ACAPS, and LabWrite.

To leave a comment for the author, please follow the link and comment on their blog: G-Forge » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Introducing the htmlTable-package

Objective: visualize migration patterns between Swedish counties the last 15 years

Related

Objective: visualize migration patterns between Swedish counties the last 15 years

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)