Site icon R-bloggers

Using {sf} to calculate catchment areas

[This article was first published on R tips – NHS-R Community, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

By Tom Jemmett

R is a highly capable GIS tool, thanks to the {sf} pacakge. In this post I am going to show you how you can use {sf} with openly available data to calculate catchment areas and population sizes, and then how to calculate travel time isochrone’s with {osrm}.

knitr::opts_chunk$set(echo = TRUE)

suppressPackageStartupMessages({
  library(tidyverse)
  library(readxl)
  library(glue, include.only = "glue_data")
  library(scales, exclude = c("discard", "col_factor"))
  library(sf)
  library(osrm) 
  library(leaflet)
  library(leaflet.extras)
})

Firstly, I have prepared a dataset containing the 23 A&E departments that are Adult Major Trauma Centre’s (MTC’s) in England and Wales. This dataset is an {sf} object which contains a special column called geometry – this column contains the locations of the hospitals. The data is sourced from this, the Organisation Data Service (ODS) and the NHS Postcode Database.

When we print this table out we get a bit of information telling us how many features (rows) and fields (columns) there are, along with what the type of the geometry is (in this case, we have points). We also get the “bounding box” (bbox) – this is like a rectangle that encloses all of our geometries. Finally, we get the coordinate reference system (CRS) that is being used.

Below this we see a table of data – like any other dataframe in R.

major_trauma_centres <- read_sf(
  paste0(
    "https://gist.githubusercontent.com/tomjemmett/",
    "ac111a045e1b0a60854d3d2c0fb0baa8/raw/",
    "a1a0fb359d1bc764353e8d55c9d804f47a95bfe4/",
    "major_trauma_centres.geojson"
  ),
  agr = "identity"
)
major_trauma_centres
## Simple feature collection with 23 features and 2 fields
## Attribute-geometry relationship: 0 constant, 0 aggregate, 2 identity
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -4.113684 ymin: 50.41672 xmax: 0.1391145 ymax: 54.98021
## geographic CRS: WGS 84
## # A tibble: 23 x 3
##    name                                           org_id                geometry
##    <chr>                                          <chr>              <POINT [°]>
##  1 ADDENBROOKE'S HOSPITAL                         RGT01     (0.1391145 52.17374)
##  2 DERRIFORD HOSPITAL                             RK950     (-4.113684 50.41672)
##  3 HULL ROYAL INFIRMARY                           RWA01    (-0.3581464 53.74411)
##  4 JAMES COOK UNIVERSITY HOSPITAL                 RTDCR     (-1.214805 54.55176)
##  5 JOHN RADCLIFFE HOSPITAL                        RTH08     (-1.219806 51.76387)
##  6 KING'S COLLEGE HOSPITAL (DENMARK HILL)         RJZ01   (-0.09391574 51.46808)
##  7 LEEDS GENERAL INFIRMARY                        RR801     (-1.551744 53.80145)
##  8 NORTHERN GENERAL HOSPITAL                      RHQNG      (-1.455966 53.4098)
##  9 NOTTINGHAM UNIVERSITY HOSPITALS NHS TRUST - Q~ RX1RA      (-1.185957 52.9438)
## 10 QUEEN ELIZABETH HOSPITAL                       RRK02     (-1.938476 52.45318)
## # ... with 13 more rows

This dataframe can be used like any other dataframe in R, for example, we can use dplyr to filter rows out we aren’t interested in, or join to another dataframe to bring in extra information.

For example, we can load the LSOA population estimates and join it to the LSOA boundaries file from the ONS geoportal. If you aren’t familiar with Output Areas see the ONS Census Geography documentation.

First, let’s load the population estimate file.

# download the LSOA population estimates from ons website: for some reason they
# provide this download as an excel file in a zip file, so we need to download
# the zip then extract the file, but we don't need to keep the zip after. withr
# handles this temporary like file for us
if (!file.exists("SAPE22DT2-mid-2019-lsoa-syoa-estimates-unformatted.xlsx")) {
  withr::local_file("lsoa_pop_est.zip", {
    download.file(
      paste0(
        "https://www.ons.gov.uk/file?uri=/peoplepopulationandcommunity/",
        "populationandmigration/populationestimates/datasets/",
        "lowersuperoutputareamidyearpopulationestimates/mid2019sape22dt2/",
        "sape22dt2mid2019lsoasyoaestimatesunformatted.zip"
      ),
      "lsoa_pop_est.zip",
      mode = "wb"
    )
    unzip("lsoa_pop_est.zip")
  })
}

lsoa_pop_estimates <- read_excel(
  "SAPE22DT2-mid-2019-lsoa-syoa-estimates-unformatted.xlsx",
  "Mid-2019 Persons",
  skip = 3
) %>%
  select(LSOA11CD = `LSOA Code`, pop = `All Ages`)

head(lsoa_pop_estimates)
## # A tibble: 6 x 2
##   LSOA11CD    pop
##   <chr>     <dbl>
## 1 E01011949  1954
## 2 E01011950  1257
## 3 E01011951  1209
## 4 E01011952  1740
## 5 E01011953  2033
## 6 E01011954  2210

Now we can load the boundary data and join to our population estimates.

lsoa_boundaries <- read_sf(
  "https://opendata.arcgis.com/datasets/e9d10c36ebed4ff3865c4389c2c98827_0.geojson"
) %>%
  # there are other fields in the lsoa data, but we only need the LSOA11CD field
  select(LSOA11CD) %>%
  inner_join(lsoa_pop_estimates, by = "LSOA11CD") %>%
  # this helps when combining different sf objects and get's rid of some messages
  st_set_agr(c(LSOA11CD = "identity", pop = "aggregate"))

We can now use ggplot to create a simple plot showing the populations of the LSOA’s and add points to show the MTC’s.

ggplot() +
  # first plot the lsoa boundaries and colour by the population
  geom_sf(data = lsoa_boundaries, aes(fill = pop), colour = NA) +
  geom_sf(data = major_trauma_centres) +
  scale_fill_distiller(type = "div", palette = "Spectral",
                       # tidy up labels in the legend, display as thousands
                       labels = number_format(accuracy = 1e-1, scale = 1e-3, suffix = "k")) +
  theme_void() +
  theme(legend.position = c(0, 0.98),
        legend.justification = c(0, 1)) +
  labs(fill = "Population (thousands)")

We can now try to calculate the population estimates for each of the MTC’s. We can take advantage of {sf} to perform a spatial join between our two datasets. A spatial join is similar to a join in a regular Sql database, except instead of joining on columns we join on geospatial features.

In this case we want to take each LSOA and find which MTC it is closest to, so we can use the st_nearest_feature predicate function with st_join. This will give us a dataset containing all of the rows from lsoa_boundaries, but augmented with the columns from major_trauma_centres.

We can then use standard {dplyr} functions to group by the MTC name and org_id fields, and calculate the sum of the pop column.

{sf} is clever – it knows that when we are summarising a geospatial table we need to combine the geometries somehow. What it will do is call st_union and return us a single geometry per MTC.

One thing we need to do before joining our data though is to transform our data temporarily to a different CRS. Our data currently is using latitude and longitude in a spherical projection, but st_nearest_points assumes that the points are planar. That is, it assumes that the world is flat. This can lead to incorrect results.

But, we can transform the data to use the British National Grid. This instead specifies points as how far east and north from an origin. This has a CRS number of 27700. Once we are done summarising our data we can again project back to the previous CRS (4326). This article explains these numbers in a bit more detail.

mtcv_pop <- lsoa_boundaries %>%
  st_transform(crs = 27700) %>%
  # assign each LSOA to a single, closest, MTC
  st_join(
    st_transform(major_trauma_centres, crs = 27700),
    join = st_nearest_feature
  ) %>%
  # now, summarise our data frame based on MTC's to find the total population
  group_by(name, org_id) %>%
  # note: summarise will automatically call st_union on the geometries
  # this gives us the lsoa's as a combined multipolygon
  summarise(across(pop, sum), .groups = "drop") %>%
  st_transform(crs = 4326)

We can now visualise this map. This time we will use the {leaflet} package to create an interactive html map.

First we need a function for colouring the areas.

pal_fun <- colorNumeric("Spectral", NULL, n = 7, reverse = TRUE)

As this is an interactive map we can add popups to the area’s to make it easier to identify which MTC we are looking at and to show us the estimated population.

p_popup <- glue_data(mtcv_pop,
                     "<strong>{name}</strong> ({org_id})",
                     "Estimated Population: {comma(pop)}",
                     .sep = "\n")

Now we can create our map

mtcv_pop %>%
  leaflet() %>%
  # add the areas that have been assigned to each MTC
  addPolygons(stroke = TRUE,
              color = "black",
              opacity = 1,
              weight = 1,
              fillColor = ~pal_fun(pop),
              fillOpacity = 1,
              popup = p_popup) %>%
  # add markers for the MTC's
  addCircleMarkers(data = major_trauma_centres,
                   color = "black",
                   stroke = TRUE,
                   fillColor = "white",
                   weight = 1,
                   radius = 3,
                   opacity = 1,
                   fillOpacity = 1) %>%
  setMapWidgetStyle(list(background= "white"))

Click here to view an interactive version of this map.

This method certainly is not perfect. For example, we can see that the closest MTC for most of Somerset is Cardiff, but in order to reach Cardiff you need to cross the Bristol Channel! Not so easy by a Land Ambulance!

A better approach would be to use some method to calculate travel time from each LSOA to the MTC’s. Using the {osrm} package, we can get isochrone’s from Open Street Map. An isochrone is an area of equal travel time from a point. So, if we calculate isochrone’s for each of the MTC’s, then join to the LSOA’s, we can more accurately assign our LSOA’s to the MTC’s.

This code will calculate the isochrone’s for our MTC’s, in 30 minute travel time increments, from 0 to 150 minutes. This is calculated as per usual car travel times, and not for ambulance travel times. Note, this can take a few minutes to run.

mtc_iso <- major_trauma_centres %>%
  # we need to calculate the isochrone's for each row individually, so we need
  # to map over the geometries
  mutate(iso = map(geometry,
                   osrmIsochrone,
                   breaks = seq(0, 150, by = 30),
                   returnclass = "sf")) %>%
  # we are going to replace the existing geometry column (the MTC location) with
  # the isochrones
  st_drop_geometry() %>%
  unnest(iso) %>%
  st_set_geometry("geometry") %>%
  # transform to a planar projection to make calculations more reliable
  st_transform(crs = 27700)

Now we can do the join between our LSOA’s and the MTC isochrones. Potentially an LSOA may have more than one isochrone for two or more MTC’s. In these cases we need to select the closest isochrone. The first part of the code above handles this by filtering the data by the minimum value for the max column (maximum time of the isochrone).

However, this may not return unique results; if an LSOA is matched by two isochrones of the same time for two different MTC’s then our filter will find both! In these cases we select the closest MTC based on straight line distance. This can be quite slow, so to reduce the computational burden I split my data into two groups: those that have more than 1 match (which we apply the closest distance filter to) and those that have just 1 match.

Once we have done this filtering we can recombine the data.

lsoa_tmp <- local({
  x <- lsoa_boundaries %>%
    st_transform(crs = 27700) %>%
    st_join(mtc_iso) %>%
    group_by(LSOA11CD) %>%
    filter(max == min(max)) %>%
    mutate(nx = n() > 1) %>%
    ungroup() %>%
    group_nest(nx)
  
  # splitting this up reduces the amount of computation required: just work out
  # the distance and filter where we have more than 1 match
  x$data[[which(x$nx)]] <- x$data[[which(x$nx)]] %>%
    inner_join(major_trauma_centres %>%
                 st_transform(crs = 27700) %>%
                 as_tibble() %>%
                 rename(hospital = geometry),
               by = c("name", "org_id")) %>%
    mutate(dist = map2_dbl(st_centroid(geometry), hospital, st_distance)) %>%
    group_by(LSOA11CD) %>%
    filter(dist == min(dist)) %>%
    select(-dist, -hospital)
  
  # recombine our data
  x <- bind_rows(x$data)
  
  # some lsoa's are missed out in the steps above, add these back in by copying
  # the data from their closest neighbour in x
  y <- lsoa_boundaries %>%
    st_transform(crs = 27700) %>%
    anti_join(st_drop_geometry(x), by = "LSOA11CD") %>%
    st_join(select(x, -LSOA11CD), join = st_nearest_feature)
  
  # join x and y together
  bind_rows(x, y)
})

Now that we have assigned the LSOA’s to a MTC and isochrone we can summarise our data to give us the combined isochrones.

mtc_iso_fixed <- lsoa_tmp %>%
  group_by(name, org_id, max) %>%
  summarise(.groups = "drop") %>%
  st_transform(4326) %>%
  mutate(time = glue::glue("{max - 30} to {max} mins") %>%
           fct_reorder(max))

And we can summarise the MTC’s to find the revised catchment populations.

mtc_points <- major_trauma_centres %>%
  inner_join(lsoa_tmp %>%
               st_drop_geometry() %>%
               group_by(name, org_id) %>%
               summarise(mean_time = sum(center * pop) / sum(pop),
                         across(pop, sum),
                         .groups = "drop"),
             by = c("name", "org_id"))

Finally we can put this together into a map.

mtc_times <- levels(mtc_iso_fixed$time)
mtc_pal <- colorFactor(viridis::viridis(length(mtc_times)), mtc_times)
                       
leaflet(mtc_iso_fixed) %>%
 addTiles() %>%
 addPolygons(fillColor = ~mtc_pal(time),
             color = "#000000",
             weight = .5,
             fillOpacity = 0.5,
             popup = ~glue::glue("<b>{name}</b><br>{time}")) %>%
 addMarkers(data = mtc_points,
            popup = ~glue::glue("<b>{name}</b> ({org_id})",
                                "Mean Time: {round(mean_time, 0)} minutes",
                                "Pop: {scales::comma(pop)}",
                                .sep = "<br>"))

Click here to view an interactive version of this map.

We could improve this by making the isochrones be smaller, for example 15 minute groups rather than 30 minutes. However, this would take longer to download from Open Streetmap, and it will take a lot longer to do the LSOA assingment. Though, having smaller groups would mean we need more colours which would make the map harder to read!

I hope that you have found this useful! If you want to learn more about geocomputation in R then you should read the book “Geocomputation with R” which is available to read free online.

The post Using {sf} to calculate catchment areas appeared first on NHS-R Community.

To leave a comment for the author, please follow the link and comment on their blog: R tips – NHS-R Community.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.