Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Yes, it started with a tweet:
Nice graphic on urine components via https://t.co/sfuXNB02sF pic.twitter.com/vhVLahQ8su
— Metabolomics (@metabolomics) January 31, 2017
By what measure is this a “nice graphic”? First, the JPEG itself is low-quality. Second, it contains spelling and numerical errors (more on that later). And third…do I have to spell this out…those are 3D pie charts.
Can it be fixed?
So far as I know, there isn’t a tool to generate data by extracting labels from images, so I sat down and typed in the numbers manually. Here they are for download. The top and bottom pie charts are identified by “all” and “other”, respectively.
Better make sure those percentages total 100, before we get into charts.
library(ggplot2) library(dplyr) library(readr) urine1 <- read_csv("urine1.csv", col_names = FALSE) colnames(urine1) <- c("component", "all_other", "percent") # top chart - good! urine1 %>% filter(all_other == "all") %>% summarise(total = sum(percent)) %>% glimpse() # Observations: 1 # Variables: 1 # $ total <dbl> 99.9 # bottom chart - not good urine1 %>% filter(all_other == "other") %>% summarise(total = sum(percent)) %>% glimpse() # Observations: 1 # Variables: 1 # $ total <dbl> 113.61
Slices in the bottom chart sum to 113.61%. Problem. Not being an expert in urine composition I have no idea which figures might be incorrect, so I’ll just have to discard that data. However, on the subject of accuracy, I do know that it is lysozyme not lyzozyme and immunoglobulins not immunoglobulines.
Back to the top chart. Why are pie charts bad? Because we are poor at visually assessing relative areas (but good at assessing relative heights). And why are 3D pie charts bad? Because they are nothing but a gimmick, adding nothing to the visualisation and in fact, distorting it in the attempt to render perspective. The commonly-heard rejoinder is “but business people like them.” Well, that doesn’t make them right.
So we could try a bar chart, sorted by value.
urine1 %>% filter(all_other == "all") %>% ggplot(aes(reorder(component, -percent), percent)) + geom_col(fill = "skyblue3") + theme_bw() + labs(x = "component", y = "percent", title = "Composition of human urine", subtitle = "50 g dry weight / L")
Which is OK: it makes it easy to see and compare the relative proportion of each component. There’s a lot of white space though. We could stack the bars, but that would create problems in choosing a colour palette. So here’s another alternative: a treemap, created using the rather wonderful highcharter package.
library(treemap) library(highcharter) urine1_tm <- treemap(filter(urine1, all_other == "all"), index = "component", vSize = "percent", palette = "Spectral") urine1_tmhc <- highchart() %>% hc_add_series_treemap(urine1_tm, name = "urine", layoutAlgorithm = "squarified") %>% hc_title(text = "Composition of human urine (50 g dry weight / L)")
Result: a nice interactive chart. Published straight to RPubs from the RStudio viewer pane, by the way. RStudio is just great. Here’s the non-interactive screenshot.
I’d suggest that if you must present proportions by area, this is a much nicer way to do it.
In summary then:
- pie charts bad
- 3D pie charts awful
- columns functional, if not always compelling
- so many other wonderful tools out there to visualise data than the tired old options
Filed under: R, statistics Tagged: charts, data, visualisation
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.