Making headlines

HighlandR

1 day ago

[This article was first published on HighlandR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my current mammoth work project, I’m generating many plots. The titles are very descriptive (they tell you what the plot is about), but they are not really telling a story.

That’s simply because there are so many on the production line.

What we’d like, is to analyse the data, and extract the salient points.
Better still, we’d want this to adjust dynamically for each plot.
Something along the lines of “some place”” is higher/lower than “another place” but not higher than “some council”.

However, I’ve had to park that for now because it seemed like an entire project in itself.

Enter, {headliner}

At the weekend I discovered the {headliner} package from Jake Riley

I have had a play around with it and it’s brilliant – such a clever solution to a potentially cumbersome problem

You’ll find the repo here (https://github.com/rjake/headliner) and package website here

I’ve uploaded some trial data relating to the population projections for Inverness (this is data already in the public domain courtesy of the Improvement Service).

I’ll avoid putting too much wrangling code on the blog post, but you’ll find the code on the repo here. It’s a minimal dataset showing the start and end projections, for 2018 and 2030, at various age-bands.

I’ve wrangled this table into a wider format for the particular chart that I want to make

t1

##    year ageband   pop pop2030 year2
## 1: 2018    0-15 14528   13165  2030
## 2: 2018   16-44 28859   29621  2030
## 3: 2018   45-64 22986   22577  2030
## 4: 2018   65-74  8372   10343  2030
## 5: 2018   75-84  4916    6901  2030
## 6: 2018     85+  1914    2611  2030

Now I want to make some text for use in my chart

Write your own headlines

I use the add_headline_column function to compare the 2030 values with the 2018 values, and then state whether this is an increase or decrease with the trend placeholder. {delta_p} returns the variance as a percentage, and finally, Jake showed me how to nicely format the actual values, rather than the boring, hard to read actual values I had on my first attempt:

chart_text <- t1 %>% 
      add_headline_column( x = pop2030, 
                           y = pop, 
                           headline = "population in the {ageband} ageband will {trend} by {delta_p}%  ({f(x)} vs {f(y)})", 
                           f = scales::number_format(big.mark = ","))  

chart_text$headline

## [1] "population in the 0-15 ageband will decrease by 9.4%  (13,165 vs 14,528)" 
## [2] "population in the 16-44 ageband will increase by 2.6%  (29,621 vs 28,859)"
## [3] "population in the 45-64 ageband will decrease by 1.8%  (22,577 vs 22,986)"
## [4] "population in the 65-74 ageband will increase by 23.5%  (10,343 vs 8,372)"
## [5] "population in the 75-84 ageband will increase by 40.4%  (6,901 vs 4,916)" 
## [6] "population in the 85+ ageband will increase by 36.4%  (2,611 vs 1,914)"

Hopefully you can already see how useful this is. This is going to save me so many if/ if_else/ statements.

The reason I’m doing this is so I can choose to tailor specific sentences in my plots.

I decided I would add in each of these placeholders as a separate vector. There may well be a slicker way of doing this, but I haven’t had a lot of time to delve into it.

chart_text <- chart_text  %>% 
      add_headline_column(x = pop2030, 
                          y = pop,
                          headline = "{delta_p}%", 
                          .name = "headline2")
    
    chart_text <- chart_text  %>% 
      add_headline_column(x = pop2030, 
                          y = pop,
                          headline = "{delta_p}", 
                          .name = "headline3")
    
    chart_text <- chart_text  %>% 
      add_headline_column(x = pop2030, 
                          y = pop,
                          headline = "{trend}", 
                          .name = "headline4",  
                          f = scales::number_format(big.mark = ","))
    
    chart_text <- chart_text  %>% 
      add_headline_column(x = pop2030, 
                          y = pop,
                          headline = "{f(x)} vs {f(y)}", 
                          .name = "headline5",  
                          f = scales::number_format(big.mark = ","))

Find the story

I decided I wanted to specifically focus on the 75-84 ageband, as that has the biggest increase across the patch, in percentage terms at least. (Arguably, the smaller increase in the 16-44 age and is of more interest to planners or public health officials because there are so many of them).

First – let’s figure out the row I need, but instead of just grabbing the row number, I’ll grab the row itself

source_row <- chart_text[chart_text[, .I[headline3 == max(headline3)], 
                                    by = headline4]$V1
                         ][headline4 == "increase"]

Yet another nifty data.table trick!

Then I create some more variables to drop in to my text.

tar_age <- source_row$ageband
tar_trend <- source_row$headline4
tar_amount <- source_row$headline2

para_text <- glue::glue("The {HSCPval} population in the ",
                        {tar_age},
                        " ageband is projected to ",
                        {tar_trend}, 
                        " by ",
                        tar_amount,
                        " by ", 
                        {year_end})

## The Inverness population in the 75-84 ageband is projected to increase by 40.4% by 2030

Show the story

This is a style of plot I’ve been wanting to make for ages, inspired by work by my mate Ryo, (I think I first saw something like this for the Liverpool squad profiles).

I won’t be able to do this at work, as this kind of thing would not gain mass acceptance, but I like it, so here goes

First, some set up work

t1$percent <- as.numeric(chart_text$headline3)/100   
t1$direction <- chart_text$headline4
t1$colours <- if_else(t1$direction == "increase",  year_end_col, year_start_col)
t1$percent <- if_else(t1$direction == "increase",t1$percent, t1$percent * -1)
t1$direction <- if_else(t1$direction == "increase","Increase", "Decrease")
  
 index <- c(0, 0.25, 0.5, 0.75, 1)

Behold, the final plot

I really like the use of geom_link from {ggforce} to give an impression of movement.
There’s more that could be done to improve this, but I find that in the real world, people don’t care too much about whether you’ve followed all the rules of data visualization.

They just want to know what they need to know, and you need to be able to tell them.

This package helps you, help them.

This seems like a bit of work for a single plot, but using targets or purrr, I can add in some more variables and easily cycle through each of my 13 areas and pick out the relevant populations. I could even use pmap to vary whether I am looking for increases or decreases, or min / max values, on a case by case basis.

When you consider I already have 50+ plots (for 13 different areas), each needing a bespoke title ideally, you can hopefully understand how impactful {headliner} could be.

Im very excited by this package – I believe it’s a real game-changer for deriving insight.

(Well, at least until Chat-GPT beats us to it).

Go and star it and install it. I’m sure you will find it very worthwhile.

To leave a comment for the author, please follow the link and comment on their blog: HighlandR.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.