Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I love a good visualization to assist in telling the story of your data. Because of this I am completely hooked on a variety of data visualization packages and tooling. But what happens with you need to visualize the raw numbers? Do you open up the data set in the viewer and screenshot? Do you save the summarized data set locally and add a bit of formatting in excel? That’s what I used to do with my R summary tables. But it got me thinking; why can’t tables be treated as a first class data visualization too? Tables need a little pizazz as much as the next data object!
Enter the r package formattable! The formattable package is used to transform vectors and data frames into more readable and impactful tabular formats.
I’m going to walk you through a step-by-step example of using the formattable R package to make your data frame more presentable for data storytelling.
Set Up R
In terms of setting up the R working environment, we have a couple of options open to us. We can use something like R Studio for a local analytics on our personal computer. Or we can use a free, hosted, multi-language collaboration environment like Watson Studio. If you’d like to get started with R in IBM Watson Studio, please have a look at the tutorial I wrote.
Install and load packages, Set variables
R packages contain a grouping of R data functions and code that can be used to perform your analysis. We need to install and load them in your environment so that we can call upon them later. We are also going to assign a few custom color variables that we will use when setting the colors on our table. If you are in Watson Studio, enter the following code into a cell (or multiple cells), highlight the cell and hit the “run cell” button.
# Install the relevant libraries - do this one time install.packages("data.table") install.packages("dplyr") install.packages("formattable") install.packages("tidyr") #Load the libraries library(data.table) library(dplyr) library(formattable) library(tidyr)
#Set a few color variables to make our table more visually appealing customGreen0 = "#DeF7E9" customGreen = "#71CA97" customRed = "#ff7f7f"
Download the Data
For our tutorial we are going to be using a data set from the Austin Open Data Portal. It’s a website designed to make facilitate easy access to open government data. I’ve been playing around with it somewhat frequently and I’m really impressed with the consistency of design and features per data set. A lot of other open data portals do not make it this easy to find and download data from.
We are going to be using formattable on the Imagine Austin Indicators dataset. As per the Imagine Austin website, the data set tracks key performance indicators (KPIs) of Austins progress in creating a connected, vibrant and livable city.
#Download the Austin indicator data set austinData= fread('https://raw.githubusercontent.com/lgellis/MiscTutorial/master/Austin/Imagine_Austin_Indicators.csv', data.table=FALSE, header = TRUE, stringsAsFactors = FALSE) head(austinData) attach(austinData)
Modify the Data Set
We are going to narrow down the data set to focus on 4 key health metrics. Specifically the prevalence of obesity, tobacco use, cardiovascular disease and obesity. We are then going to select only the indicator name and yearly KPI value columns. Finally we are going to make extra columns to display the 2011 to 2016 yearly average and the 2011 to 2016 metric improvements.
i1 <- austinData %>% filter(`Indicator Name` %in% c('Prevalence of Obesity', 'Prevalence of Tobacco Use', 'Prevalence of Cardiovascular Disease', 'Prevalence of Diabetes')) %>% select(c(`Indicator Name`, `2011`, `2012`, `2013`, `2014`, `2015`, `2016`)) %>% mutate (Average = round(rowMeans( cbind(`2011`, `2012`, `2013`, `2014`, `2015`, `2016`), na.rm=T),2), `Improvement` = round((`2011`-`2016`)/`2011`*100,2)) i1
View the table data in it’s raw format
We now have the data in the table we want, so let’s display it to our audience. We can start by viewing the table in it’s raw format.
i2
View the Data with the Formattable Package
Viewing the data by simply printing it did not produce a nice looking table. Let’s see what formattable gives us out of the box.
#0) Throw it in the formattable function formattable(i1)
Not bad! But let’s spruce it up a little. We will left align the first column, right align the last column and center align the rest. Additionally we will bold and make grey the the row title: Indicator Name.
#1) First Data Table formattable(i1, align =c("l","c","c","c","c", "c", "c", "c", "r"), list(`Indicator Name` = formatter( "span", style = ~ style(color = "grey",.weight = "bold")) ))
Add a Color Tile for All Year Columns
We will add the color_tile function to all year columns. This creates the effect of a column by column row wise heat map, and it looks great! Note that we are using our own custom colors declared in the very beginning of the code to ensure our table has the look and feel we want.
#2) Add the color mapping for all 2011 to 2016. formattable(i1, align =c("l","c","c","c","c", "c", "c", "c", "r"), list( `Indicator Name` = formatter("span", style = ~ style(color = "grey",.weight = "bold")), `2011`= color_tile(customGreen, customGreen0), `2012`= color_tile(customGreen, customGreen0), `2013`= color_tile(customGreen, customGreen0), `2014`= color_tile(customGreen, customGreen0), `2015`= color_tile(customGreen, customGreen0), `2016`= color_tile(customGreen, customGreen0) ))
Add a Color Bar for the Average Column
We will now add the color_bar function to the average column. Rather than using a heat map, it will display the same background color each time. However, it will have a bar line to indicate relative row wise size of the values.
#3) Add the color bar to the average column formattable(i1, align =c("l","c","c","c","c", "c", "c", "c", "r"), list( `Indicator Name` = formatter("span", style = ~ style(color = "grey",.weight = "bold")), `2011`= color_tile(customGreen, customGreen0), `2012`= color_tile(customGreen, customGreen0), `2013`= color_tile(customGreen, customGreen0), `2014`= color_tile(customGreen, customGreen0), `2015`= color_tile(customGreen, customGreen0), `2016`= color_tile(customGreen, customGreen0), `Average` = color_bar(customRed) ))
Add Our Own Format Function
One great tip that I learned from the vignette is that you can make your own formatting functions really easily. Using their examples in the vignette and on bioinfo.irc.ca, I made a slight modification to create our own improvement_formatter function that bolds the text and colors it our custom red or green depending on it’s value.
#4) Add sign formatter to improvement over time improvement_formatter <- formatter("span", style = x ~ style( .weight = "bold", color = ifelse(x > 0, customGreen, ifelse(x < 0, customRed, "black")))) formattable(i1, align =c("l","c","c","c","c", "c", "c", "c", "r"), list( `Indicator Name` = formatter("span", style = ~ style(color = "grey",.weight = "bold")), `2011`= color_tile(customGreen, customGreen0), `2012`= color_tile(customGreen, customGreen0), `2013`= color_tile(customGreen, customGreen0), `2014`= color_tile(customGreen, customGreen0), `2015`= color_tile(customGreen, customGreen0), `2016`= color_tile(customGreen, customGreen0), `Average` = color_bar(customRed), `Improvement` = improvement_formatter ))
Modify the Format Function To Display Images
We are going to slightly modify the format to display the up and down arrow depending on the value of improvement. Note, that in the video above, I also change the formatter to display a thumbs up symbol on the winning improvement value. The code for this and other examples are available on my github repo.
#5) For improvement formatter add icons # Up and down arrow with greater than comparison from the vignette improvement_formatter <- formatter("span", style = x ~ style(.weight = "bold", color = ifelse(x > 0, customGreen, ifelse(x < 0, customRed, "black"))), x ~ icontext(ifelse(x>0, "arrow-up", "arrow-down"), x) ) formattable(i1, align =c("l","c","c","c","c", "c", "c", "c", "r"), list( `Indicator Name` = formatter("span", style = ~ style(color = "grey",.weight = "bold")), `2011`= color_tile(customGreen, customGreen0), `2012`= color_tile(customGreen, customGreen0), `2013`= color_tile(customGreen, customGreen0), `2014`= color_tile(customGreen, customGreen0), `2015`= color_tile(customGreen, customGreen0), `2016`= color_tile(customGreen, customGreen0), `Average` = color_bar(customRed), `Improvement` = improvement_formatter ))
Add an Icon in The Row Title Based on the Value in Another Column
We are going to make one last modification to append an image to the indicator name column based on a value located in another column. This is an important departure from our previous behavior, because previously we were only assigning the format of a single column based on it’s own values. In order to enable cross column compare, we just need to remove the x in front of the ~ style and the ~ icontext conditions. This will allow us to explicitly specify the columns we want to reference.
#6) Add a star to the max value. Use if/else value = max(value) improvement_formatter <- formatter("span", style = x ~ style(.weight = "bold", color = ifelse(x > 0, customGreen, ifelse(x < 0, customRed, "black"))), x ~ icontext(ifelse(x == max(x), "thumbs-up", ""), x) ) ## Based on Name formattable(i1, align =c("l","c","c","c","c", "c", "c", "c", "r"), list( `Indicator Name` = formatter("span", style = x ~ style(color = "gray"), x ~ icontext(ifelse(x == "Prevalence of Tobacco Use", "star", ""), x)), `2011`= color_tile(customGreen, customGreen0), `2012`= color_tile(customGreen, customGreen0), `2013`= color_tile(customGreen, customGreen0), `2014`= color_tile(customGreen, customGreen0), `2015`= color_tile(customGreen, customGreen0), `2016`= color_tile(customGreen, customGreen0), `Average` = color_bar(customRed), `Improvement` = improvement_formatter ))
Compare Column to Column
Finally, we are going to just do a simple cross column row wise comparison. We’ll take our same data set but strip it back to just 2015 and 2016 data. We will then compare the values and mark up the 2016 column as up/down and green/red based on comparing the 2016 value to the 2015 value.
##7) Compare column to column #Drop the rest and show just 2015 and 2016 i2 <- austinData %>% filter(`Indicator Name` %in% c('Prevalence of Obesity', 'Prevalence of Tobacco Use', 'Prevalence of Cardiovascular Disease', 'Prevalence of Diabetes')) %>% select(c(`Indicator Name`, `2015`, `2016`)) head(i2) formattable(i2, align =c("l","c","c"), list( `Indicator Name` = formatter("span", style = ~ style(color = "gray")), `2016`= formatter("span", style = ~ style(color = ifelse(`2016` >`2015`, "red", "green")), ~ icontext(ifelse(`2016` >`2015`,"arrow-up", "arrow-down"), `2016`)) ))
Extras
In the full github code, you will see a number of other examples. As a bonus, I’ve also included the code to create the animation using the magick package!
Thank You
Thanks for reading along while we explored the formattable package. Please share your thoughts and creations with me on twitter.
Note that the full code is available on my github repo. If you have trouble downloading the file from github, go to the main page of the repo and select “Clone or Download” and then “Download Zip”.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.