UK Energy company acquisitions – a great use case for Sankey plots
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Like many others in the ‘United’ Kingdom, I’ve found my energy supplier has gone bust, due to ongoing price hikes, and I’ve been moved over to another one, whom ordinarily I wouldn’t have signed up with.
Fantastic.
Because I’m not always sure you get the sarcasm
How many folk have been affected by this sort of thing since the energy market prices went haywire in 2021?
Here’s what I’m using:
library(data.table) library(dplyr) library(tidyr) library(networkD3) library(htmlwidgets)
I found some data relating to the closures of energy companies in the UK and this has been saved here
Here’s how the data looks:
data <- data.table::fread('https://raw.githubusercontent.com/johnmackintosh/energy-closures/main/rawdata.csv') head(data,10) ## # A tibble: 10 x 4 ## Date `Failed_supplier` Customers`Acquired_b~ ## <dttm> <chr> <dbl> <chr> ## 1 2022-01-01 00:00:00 Together Energy Retail Ltd. 176000 British Gas ## 2 2021-12-01 00:00:00 Zog Energy 11700 EDF ## 3 2021-11-01 00:00:00 Entice Energy Supply Ltd 5400 Scottish Power ## 4 2021-11-01 00:00:00 Orbit Energy 65000 Scottish Power ## 5 2021-11-01 00:00:00 Bulb 1700000 *In special admini~ ## 6 2021-11-01 00:00:00 Neon Reef Ltd. 30000 British Gas ## 7 2021-11-01 00:00:00 Social Energy Supply Ltd. 5500 British Gas ## 8 2021-11-01 00:00:00 CNG Energy Ltd. 41000 Pozitive Energy ## 9 2021-11-01 00:00:00 Omni Energy Ltd. 6000 Utilita ## 10 2021-11-01 00:00:00 MA Energy Ltd. 300 SmartestEnergy Bus~
I don’t like the column headers, and I want to restrict my final plot to just plotting by year rather than individual date, so I create a ‘year_acquired’ column:
data2 <- data %>% rename(failed = `Failed_supplier`, acquired = `Acquired_by`) %>% mutate(year_acquired = lubridate::year(Date)) %>% relocate(year_acquired, .before = 'Date') %>% select(-Date)
I want a sankey plot, so I need to get my data into shape , and while I’m at it, I’ll filter it for 2021 and onwards:
links <- data2 %>% filter(year_acquired >= 2021) %>% mutate(row = row_number()) %>% gather('column', 'source', -row) %>% mutate(column = match(column, names(data2))) %>% group_by(row) %>% arrange(column) %>% mutate(target = lead(source)) %>% ungroup() %>% filter(!is.na(target))
Network plots are all about links and nodes. So here’s some more jiggery-pokery with links and nodes. It’s all dplyr, and as that’s so easy to understand, I won’t delve into this too much
links <- links %>% mutate(source = paste0(source, '_', column)) %>% mutate(target = paste0(target, '_', column + 1)) %>% select(source, target) nodes <- data.frame(name = unique(c(links$source, links$target))) links$source <- match(links$source, nodes$name) - 1 links$target <- match(links$target, nodes$name) - 1 links$value <- 1 nodes$name <- sub('_[0-9]+$', '', nodes$name)
Time for the Sankey plot.
It’s interactive! No mean feat on Jekyll - I don’t want to tell you how much time this took to get working.
sankeyNetwork(Links = links, Nodes = nodes, Source = 'source', Target = 'target', Value = 'value', NodeID = 'name')
Looks like some big players are doing well in the acquisition stakes, as the smaller (cheaper, with good customer service) suppliers get squeezed out.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.