Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In my current mammoth work project, I’m generating many plots. The titles are very descriptive (they tell you what the plot is about), but they are not really telling a story.
That’s simply because there are so many on the production line.
What we’d like, is to analyse the data, and extract the salient points.
Better still, we’d want this to adjust dynamically for each plot.
Something along the lines of “some place”” is higher/lower than
“another place” but not higher than “some council”.
However, I’ve had to park that for now because it seemed like an entire project in itself.
Enter, {headliner}
At the weekend I discovered the {headliner} package from Jake Riley
I have had a play around with it and it’s brilliant – such a clever solution to a potentially cumbersome problem
You’ll find the repo here (https://github.com/rjake/headliner) and package website here
I’ve uploaded some trial data relating to the population projections for Inverness (this is data already in the public domain courtesy of the Improvement Service).
I’ll avoid putting too much wrangling code on the blog post, but you’ll find the code on the repo here. It’s a minimal dataset showing the start and end projections, for 2018 and 2030, at various age-bands.
I’ve wrangled this table into a wider format for the particular chart that I want to make
t1 ## year ageband pop pop2030 year2 ## 1: 2018 0-15 14528 13165 2030 ## 2: 2018 16-44 28859 29621 2030 ## 3: 2018 45-64 22986 22577 2030 ## 4: 2018 65-74 8372 10343 2030 ## 5: 2018 75-84 4916 6901 2030 ## 6: 2018 85+ 1914 2611 2030
Now I want to make some text for use in my chart
Write your own headlines
I use the add_headline_column function to compare the 2030 values with
the 2018 values, and then state whether this is an increase or decrease
with the trend
placeholder. {delta_p}
returns the variance as a
percentage, and finally, Jake showed me how to nicely format the actual
values, rather than the boring, hard to read actual values I had on my
first attempt:
chart_text <- t1 %>% add_headline_column( x = pop2030, y = pop, headline = "population in the {ageband} ageband will {trend} by {delta_p}% ({f(x)} vs {f(y)})", f = scales::number_format(big.mark = ",")) chart_text$headline ## [1] "population in the 0-15 ageband will decrease by 9.4% (13,165 vs 14,528)" ## [2] "population in the 16-44 ageband will increase by 2.6% (29,621 vs 28,859)" ## [3] "population in the 45-64 ageband will decrease by 1.8% (22,577 vs 22,986)" ## [4] "population in the 65-74 ageband will increase by 23.5% (10,343 vs 8,372)" ## [5] "population in the 75-84 ageband will increase by 40.4% (6,901 vs 4,916)" ## [6] "population in the 85+ ageband will increase by 36.4% (2,611 vs 1,914)"
Hopefully you can already see how useful this is. This is going to save me so many if/ if_else/ statements.
The reason I’m doing this is so I can choose to tailor specific sentences in my plots.
I decided I would add in each of these placeholders as a separate vector. There may well be a slicker way of doing this, but I haven’t had a lot of time to delve into it.
chart_text <- chart_text %>% add_headline_column(x = pop2030, y = pop, headline = "{delta_p}%", .name = "headline2") chart_text <- chart_text %>% add_headline_column(x = pop2030, y = pop, headline = "{delta_p}", .name = "headline3") chart_text <- chart_text %>% add_headline_column(x = pop2030, y = pop, headline = "{trend}", .name = "headline4", f = scales::number_format(big.mark = ",")) chart_text <- chart_text %>% add_headline_column(x = pop2030, y = pop, headline = "{f(x)} vs {f(y)}", .name = "headline5", f = scales::number_format(big.mark = ","))
Find the story
I decided I wanted to specifically focus on the 75-84 ageband, as that has the biggest increase across the patch, in percentage terms at least. (Arguably, the smaller increase in the 16-44 age and is of more interest to planners or public health officials because there are so many of them).
First – let’s figure out the row I need, but instead of just grabbing the row number, I’ll grab the row itself
source_row <- chart_text[chart_text[, .I[headline3 == max(headline3)], by = headline4]$V1 ][headline4 == "increase"]
Yet another nifty data.table trick!
Then I create some more variables to drop in to my text.
tar_age <- source_row$ageband tar_trend <- source_row$headline4 tar_amount <- source_row$headline2 para_text <- glue::glue("The {HSCPval} population in the ", {tar_age}, " ageband is projected to ", {tar_trend}, " by ", tar_amount, " by ", {year_end}) ## The Inverness population in the 75-84 ageband is projected to increase by 40.4% by 2030
Show the story
This is a style of plot I’ve been wanting to make for ages, inspired by work by my mate Ryo, (I think I first saw something like this for the Liverpool squad profiles).
I won’t be able to do this at work, as this kind of thing would not gain mass acceptance, but I like it, so here goes
First, some set up work
t1$percent <- as.numeric(chart_text$headline3)/100 t1$direction <- chart_text$headline4 t1$colours <- if_else(t1$direction == "increase", year_end_col, year_start_col) t1$percent <- if_else(t1$direction == "increase",t1$percent, t1$percent * -1) t1$direction <- if_else(t1$direction == "increase","Increase", "Decrease") index <- c(0, 0.25, 0.5, 0.75, 1)
Behold, the final plot
I really like the use of geom_link from {ggforce} to give an impression of movement.
There’s more that could be done to improve this, but I find that in the real world, people don’t care too much about whether you’ve followed all the rules of data visualization.
They just want to know what they need to know, and you need to be able to tell them.
This package helps you, help them.
This seems like a bit of work for a single plot, but using targets or
purrr, I can add in some more variables and easily cycle through each of
my 13 areas and pick out the relevant populations. I could even use
pmap
to vary whether I am looking for increases or decreases, or min /
max values, on a case by case basis.
When you consider I already have 50+ plots (for 13 different areas), each needing a bespoke title ideally, you can hopefully understand how impactful {headliner} could be.
Im very excited by this package – I believe it’s a real game-changer for deriving insight.
(Well, at least until Chat-GPT beats us to it).
Go and star it and install it. I’m sure you will find it very worthwhile.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.