Let it flow, let it flow, let it flow……
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Dots..
This is not the blog post I’d originally intended to write. But I’m glad – because this one is so much better.
Some background. I’m one of the few Scotland based members of AphA – the Association of Professional Healthcare Analysts. They’ve had a couple of events recently that I was keeping my eye on via Twitter and it became apparent that a session demonstrating R had made some waves – in a good way.
I’d been having a wee exchange with Neil Pettinger regarding R and took the opportunity to ask permission to use one of his Excel files. This featured a dot plot chart that demonstrated patient flow. I wanted to show an alternative way of creating the plot using R.
Neil is an “information training consultant” – at least that’s what it says on his website but for me that is shorthand for guru of statistics, information and analysis with a huge breadth of experience working in and alongside analytical teams in the UK NHS. There can’t be many boards/ trusts within the NHS that Neil has not worked with at some point.
Neil pointed out that instead of the 2014 version I was referring to, there was a new updated version, and he kindly forwarded me that instead, along with a guide on how to produce the plot using Excel.
The plot in question shows admissions / discharges and transfers in/out on a fictional “good day” in a fictional hospital. I’m not going to try and explain Neil’s philosophy and vision regarding patient flow – (he sent me a brilliant presentation featuring works of art & sculpture -which if you attend one of his training sessions you may be fortunate to see) but suffice to say this plot is merely a starting point.
Here it is:
We have red dots for admissions (extra work/resource required), green for discharges (reduction in workload/ resource requirements).
Transfers (moving from one area/ward within the hospital to another) are represented by grey dots.
Finally, the above are split out across 3 “staging posts” – Accident & Emergency (A&E), Assessment areas and Wards. For example, if a patient comes to A&E, gets sent for assessment, and is then sent to a ward for further care, then they have 5 moves recorded in the dataset( into A&E, out of A&E, into the Assessment area, out of the Assessment area, and then in to the ward).
If you want to follow along (go on) then you should head over to Neil’s site, download the excel file and take a look at the “how to” guide on the same page. Existing R users are already likely to be shuddering at all the manual manipulation required.
For the first attempt, I followed Neil’s approach pretty closely, resulting in a lot of code to sort and group, although ggplot2 made the actual plotting much simpler. I shared my very first attempt ( produce with barely any ggplot2 code) which was quite good, but there were a few issues – the ins/ outs being coloured blue instead of grey, and overplotting of several points.
Here’s the code:
# Set limits for plotting lims <- as.POSIXct(strptime(c("2014-09-03 00:00","2014-09-03 24:00") , format = "%Y-%m-%d %H:%M")) ggplot(plot_data,aes(Movement15,Movement_15_SEQNO, colour=Movement_Type))+ geom_point()+ facet_grid(Staging_Post~.)+ scale_x_datetime(date_labels="%H:%M",date_breaks = "3 hours", limits = lims, timezone = "CET",expand = c(0,0))+ theme_minimal()+ theme(legend.position="bottom")+ theme(panel.grid.minor=element_blank(), panel.grid.major=element_blank())
And here’s what you get
If you try and run that code as is - it won’t work, because we need to do some data prep first..
The next day, I came back to the task with a view to coming up with a “slicker” process. I realised the bulk of the prep work required in Excel was actually straightforward in SQL. If you don’t know SQL, that’s OK - because this is R, and we have dplyr.
Here’s how to do it:
data <- read_xlsx("RedGreenGreyDots.xlsx", sheet = 1) #read raw data from Excel plot_data <- data %>% mutate(Movement15 = lubridate::floor_date(MovementDateTime,"15 minutes")) %>% group_by(IN_OUT, Movement_Type,Staging_Post,Movement15) %>% mutate(counter = case_when ( IN_OUT == 'IN' ~ 1, IN_OUT == 'OUT' ~ -1)) %>% mutate(Movement_15_SEQNO =cumsum(counter)) %>% ungroup() # Change "Tranfer In" or "Transfer Out" to "Transfer" plot_data$Movement_Type <- gsub("Transfer.*","Transfer",x=plot_data$Movement_Type)
NOW you can try running the ggplot2 code above and you should get a very similar looking plot.
If you’re new to R / dplyr and wondering what the heck’s going on:
-
First we read the data in using the readxlsx function from the readxl package
-
(Not shown) check the structure of the dataframe and make sure the data types all check out (always an especially good idea when importing from Excel)
-
use dplyr’s mutate function to create a new variable to group all the movements into 15 minute intervals, which is a piece of cake with lubridate’s floordate function.
-
Next we group the data by the IN_OUT, Movement_Type, Staging_Post and our new Movement15 variable
-
We then create another new column, this time to create a counter field, with a value of 1 when the IN_OUT column = “IN” (so that these points appear above the x axis horizon) and -1 when the value is “OUT” (so the points display below the horizon)
-
We create yet another variable, giving the cumulative sum of the counter field, so that we have a dots that stack upon each other at each 15 minute interval ( rather than just having one point representing the maximum / minimum values at each stage)
The last thing to do is to tidy up the Movement_Type field - we don’t want 4 movement types on our final plot, so we just change the values to “Transfer” - they are colour coded grey regardless of whether they are a transfer in or transfer out.
Plots..
Here is how the plot looks once we work some ggplot2 magic:
ggplot(plot_data,aes(Movement15,Movement_15_SEQNO, colour=Movement_Type))+ geom_jitter(width=0.10)+ scale_colour_manual(values=c("#D7100D","#40B578","grey60"))+ facet_grid(Staging_Post~.,switch = "y")+ scale_x_datetime(date_labels="%H:%M",date_breaks = "3 hours", limits = lims, timezone = "CET", expand = c(0,0))+ ggtitle(label = "Anytown General Hospital | Wednesday 3rd September 2014 00:00 to 23:59\n", subtitle="A&E AND INPATIENT ARRIVALS, DEPARTURES AND TRANSFERS")+ labs(x= NULL, y= NULL)+ theme_ipsum(base_family = "Arial Narrow")+ theme(axis.text.y=element_blank(), axis.ticks.y=element_blank())+ theme(axis.text.x=element_text(size=7)) + theme(axis.ticks.x=element_blank())+ theme(legend.position="bottom")+ theme(panel.grid.minor=element_blank(), panel.grid.major=element_blank())+ theme(strip.text.y = element_text(angle = 0))+ guides(color=guide_legend("Movement Type"))
Not bad. Being really picky - I haven’t replicated the colour coding behind each of the facet labels that Neil had on his original plot (A&E should have a yellow background for instance). Similarly, the clever colour coding of the title text on the original chart does away with the need for a legend. This is kind of annoying because its very easy to do this in Excel but a real faff to emulate in ggplot2 (probably for good reason). I have found a couple of solutions to resolve both problems, but they involve a lot of mucking about with grid settings and I’m not sure its worth the aggro.
A key change to the original ggplot2 code, apart from manually specifying the colours, is that I’m using geom_jitter instead of geom_point. This helps get rid of over-plotting issues when 2 different Movement Types occur at exactly the same 15 minute interval.
During our initial discussion, Neil asked if it was possible to animate plots in R. I reassured him it was, but had to be honest and told him it was not something I had experience of. There was a jokey aside along the lines of “ lets get the static plots working, we can make movies later”.
Well, here is the animated version of the above plot:
This was produced using very similar ggplot2 code to the static version. The key difference was the use of the gganimate package, and the use of the “frame” and cumulative arguments, which basically instruct ggplot2 to make a plot showing the cumulative picture of movements from midnight up to each 15 minute segment. For example, the first image shows midnight to quarter past, the second midnight to half past, then midnight to quarter to midnight, then midnight to 1 AM, and so on. The resulting plots are then magically stitched together and saved to the desired output - in this case as a gif.
There are a couple of ways of setting the animation options for the timescale (interval- basically how fast you want the plot to cycle through from start to finish).
You can simply tag on an “ interval = x” argument when saving the plot, or you can use the animation package, and set the options, plot size etc up front. This is the approach I used, as you’ll see in the accompanying code, but either should work - you can uncomment /comment out the relevant lines in the code and try both. There’s a link to the repo at the top of the post.
..& Dr Cox
To finish up - I let Neil have a sneak peak at this post and he came up with the seasonally inspired title. As it’s that time of year, and this is likely to be my last post before then - seasons greetings to all / both my readers, hope Santa is good to you, stay safe and well.
All the best,
John
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.