Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In my current mammoth work project, I’m generating many plots. The titles are very descriptive (they tell you what the plot is about), but they are not very interesting. What we’d like, is to analyse the data, bring out the salient points, and have our titles update dynamically .. something along the lines of “some place”” is higher/lower than “another place” but not higher than “some council”.
However, I’ve had to park that for now because it simply seemed like an entire project in itself.
At the weekend I discovered the {headliner} package from Jake Riley
I have had a play around with it and it’s brilliant – such a clever solution to a potentially cumbersome problem
You’ll find the repo here(https://github.com/rjake/headliner) and package website here
I’ve uploaded some trial data relating to the population projections for Inverness (this is data already in the public domain from the Improvement Service).
I’ll avoid putting too much wrangling code on the blog post, but you’ll find the code on the repo here. It’s a minimal dataset showing the start and end projections, for 2018 and 2030, at various age-bands.
I’ve wrangled this table into a wider format for the particular chart that I want to make
t1 ## year ageband pop pop2030 year2 ## 1: 2018 0-15 14528 13165 2030 ## 2: 2018 16-44 28859 29621 2030 ## 3: 2018 45-64 22986 22577 2030 ## 4: 2018 65-74 8372 10343 2030 ## 5: 2018 75-84 4916 6901 2030 ## 6: 2018 85+ 1914 2611 2030
Now I want to make some text for use in my chart
The starting block
I use the add_headline_column function to compare the 2030 values with
the 2018 values, and then state whether this is an increase or decrease
with the trend
placeholder. {delta_p}
returns the variance as a
percentage, and finally, Jake showed me how to nicely format the actual
values, rather than the boring, hard to read actual values I had on my
first attempt:
chart_text <- t1 %>% add_headline_column( x = pop2030, y = pop, headline = "population in the {ageband} ageband will {trend} by {delta_p}% ({f(x)} vs {f(y)})", f = scales::number_format(big.mark = ",")) chart_text <- t1 %>% add_headline_column( x = pop2030, y = pop, headline = "population in the {ageband} ageband will {trend} by {delta_p}% ({f(x)} vs {f(y)})", f = scales::number_format(big.mark = ",")) chart_text$headline ## [1] "population in the 0-15 ageband will decrease by 9.4% (13,165 vs 14,528)" ## [2] "population in the 16-44 ageband will increase by 2.6% (29,621 vs 28,859)" ## [3] "population in the 45-64 ageband will decrease by 1.8% (22,577 vs 22,986)" ## [4] "population in the 65-74 ageband will increase by 23.5% (10,343 vs 8,372)" ## [5] "population in the 75-84 ageband will increase by 40.4% (6,901 vs 4,916)" ## [6] "population in the 85+ ageband will increase by 36.4% (2,611 vs 1,914)"
Hopefully you can already see how useful this is. This is going to save me so many if/ if_else/ statements.
The reason I’m doing this is so I can choose to tailor specific sentences in my plots.
I decided I would add in each of these placeholders as a separate vector. There may well be a slicker way of doing this, but I haven’t had a lot of time to delve into it.
chart_text <- chart_text %>% add_headline_column(x = pop2030, y = pop, headline = "{delta_p}%", .name = "headline2") chart_text <- chart_text %>% add_headline_column(x = pop2030, y = pop, headline = "{delta_p}", .name = "headline3") chart_text <- chart_text %>% add_headline_column(x = pop2030, y = pop, headline = "{trend}", .name = "headline4", f = scales::number_format(big.mark = ",")) chart_text <- chart_text %>% add_headline_column(x = pop2030, y = pop, headline = "{f(x)} vs {f(y)}", .name = "headline5", f = scales::number_format(big.mark = ","))
I decided I wanted to specifically focus on the 75-84 ageband, as that has the biggest increase across the patch
First -let’s figure out the row I need, but instead of just grabbing the row number, I’ll grab the row itself
source_row <- chart_text[chart_text[, .I[headline3 == max(headline3)], by = headline4]$V1 ][headline4 == "increase"] tar_age <- source_row$ageband tar_trend <- source_row$headline4 tar_amount <- source_row$headline2 para_text <- glue::glue("The {HSCPval} population in the ", {tar_age}, " ageband is projected to ", {tar_trend}, " by ", tar_amount, " by ", {year_end}) ## The Inverness population in the 75-84 ageband is projected to increase by 40.4% by 2030
Now for a plot. This is a style of plot I’ve been wanting to make for ages, inspired by work by my mate Ryo, (I think I first saw something like this for the Liverpool squad profiles).
I won’t be able to do this at work, as this kind of thing would not gain mass acceptance, but I like it, so here goes
First, some set up work
t1$percent <- as.numeric(chart_text$headline3)/100 t1$direction <- chart_text$headline4 t1$colours <- if_else(t1$direction == "increase", year_end_col, year_start_col) t1$percent <- if_else(t1$direction == "increase",t1$percent, t1$percent * -1) t1$direction <- if_else(t1$direction == "increase","Increase", "Decrease") index <- c(0, 0.25, 0.5, 0.75, 1)
Then the plot
This seems like a bit of work for a single plot, but using targets or
purrr, I can add in some more variables and easily cycle through each of
my 13 areas and pick out the relevant populations. I could even use
pmap
to vary whether I am looking for increases or decreases, or min /
max values, on a case by case basis.
Im very excited by this package – I believe it’s a real game-changer for deriving insight.
Well, at least until CHAT-GPT beats us to it.
Go and star it, download it and use it, I’m sure you will find it very worthwhile.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.