Site icon R-bloggers

Making headlines

[This article was first published on Data By John, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my current mammoth work project, I’m generating many plots. The titles are very descriptive (they tell you what the plot is about), but they are not very interesting. What we’d like, is to analyse the data, bring out the salient points, and have our titles update dynamically .. something along the lines of “some place”” is higher/lower than “another place” but not higher than “some council”.

However, I’ve had to park that for now because it simply seemed like an entire project in itself.

At the weekend I discovered the {headliner} package from Jake Riley

I have had a play around with it and it’s brilliant – such a clever solution to a potentially cumbersome problem

You’ll find the repo here(https://github.com/rjake/headliner) and package website here

I’ve uploaded some trial data relating to the population projections for Inverness (this is data already in the public domain from the Improvement Service).

I’ll avoid putting too much wrangling code on the blog post, but you’ll find the code on the repo here. It’s a minimal dataset showing the start and end projections, for 2018 and 2030, at various age-bands.

I’ve wrangled this table into a wider format for the particular chart that I want to make

t1

##    year ageband   pop pop2030 year2
## 1: 2018    0-15 14528   13165  2030
## 2: 2018   16-44 28859   29621  2030
## 3: 2018   45-64 22986   22577  2030
## 4: 2018   65-74  8372   10343  2030
## 5: 2018   75-84  4916    6901  2030
## 6: 2018     85+  1914    2611  2030

Now I want to make some text for use in my chart

The starting block

I use the add_headline_column function to compare the 2030 values with the 2018 values, and then state whether this is an increase or decrease with the trend placeholder. {delta_p} returns the variance as a percentage, and finally, Jake showed me how to nicely format the actual values, rather than the boring, hard to read actual values I had on my first attempt:

chart_text <- t1 %>% 
      add_headline_column( x = pop2030, 
                           y = pop, 
                           headline = "population in the {ageband} ageband will {trend} by {delta_p}%  ({f(x)} vs {f(y)})", 
                           f = scales::number_format(big.mark = ","))  

chart_text <- t1 %>% 
      add_headline_column( x = pop2030, 
                           y = pop, 
                           headline = "population in the {ageband} ageband will {trend} by {delta_p}%  ({f(x)} vs {f(y)})", 
                           f = scales::number_format(big.mark = ","))  

chart_text$headline

## [1] "population in the 0-15 ageband will decrease by 9.4%  (13,165 vs 14,528)" 
## [2] "population in the 16-44 ageband will increase by 2.6%  (29,621 vs 28,859)"
## [3] "population in the 45-64 ageband will decrease by 1.8%  (22,577 vs 22,986)"
## [4] "population in the 65-74 ageband will increase by 23.5%  (10,343 vs 8,372)"
## [5] "population in the 75-84 ageband will increase by 40.4%  (6,901 vs 4,916)" 
## [6] "population in the 85+ ageband will increase by 36.4%  (2,611 vs 1,914)"

Hopefully you can already see how useful this is. This is going to save me so many if/ if_else/ statements.

The reason I’m doing this is so I can choose to tailor specific sentences in my plots.

I decided I would add in each of these placeholders as a separate vector. There may well be a slicker way of doing this, but I haven’t had a lot of time to delve into it.

chart_text <- chart_text  %>% 
      add_headline_column(x = pop2030, 
                          y = pop,
                          headline = "{delta_p}%", 
                          .name = "headline2")
    
    chart_text <- chart_text  %>% 
      add_headline_column(x = pop2030, 
                          y = pop,
                          headline = "{delta_p}", 
                          .name = "headline3")
    
    chart_text <- chart_text  %>% 
      add_headline_column(x = pop2030, 
                          y = pop,
                          headline = "{trend}", 
                          .name = "headline4",  
                          f = scales::number_format(big.mark = ","))
    
    chart_text <- chart_text  %>% 
      add_headline_column(x = pop2030, 
                          y = pop,
                          headline = "{f(x)} vs {f(y)}", 
                          .name = "headline5",  
                          f = scales::number_format(big.mark = ","))

I decided I wanted to specifically focus on the 75-84 ageband, as that has the biggest increase across the patch

First -let’s figure out the row I need, but instead of just grabbing the row number, I’ll grab the row itself

source_row <- chart_text[chart_text[, .I[headline3 == max(headline3)], 
                                    by = headline4]$V1
                         ][headline4 == "increase"]

tar_age <- source_row$ageband
tar_trend <- source_row$headline4
tar_amount <- source_row$headline2

para_text <- glue::glue("The {HSCPval} population in the ",
                        {tar_age},
                        " ageband is projected to ",
                        {tar_trend}, 
                        " by ",
                        tar_amount,
                        " by ", 
                        {year_end})

## The Inverness population in the 75-84 ageband is projected to increase by 40.4% by 2030

Now for a plot. This is a style of plot I’ve been wanting to make for ages, inspired by work by my mate Ryo, (I think I first saw something like this for the Liverpool squad profiles).

I won’t be able to do this at work, as this kind of thing would not gain mass acceptance, but I like it, so here goes

First, some set up work

t1$percent <- as.numeric(chart_text$headline3)/100   
t1$direction <- chart_text$headline4
t1$colours <- if_else(t1$direction == "increase",  year_end_col, year_start_col)
t1$percent <- if_else(t1$direction == "increase",t1$percent, t1$percent * -1)
t1$direction <- if_else(t1$direction == "increase","Increase", "Decrease")
  
 index <- c(0, 0.25, 0.5, 0.75, 1)

Then the plot

This seems like a bit of work for a single plot, but using targets or purrr, I can add in some more variables and easily cycle through each of my 13 areas and pick out the relevant populations. I could even use pmap to vary whether I am looking for increases or decreases, or min / max values, on a case by case basis.

Im very excited by this package – I believe it’s a real game-changer for deriving insight.

Well, at least until CHAT-GPT beats us to it.

Go and star it, download it and use it, I’m sure you will find it very worthwhile.

To leave a comment for the author, please follow the link and comment on their blog: Data By John.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.