Simply Mapping
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
First attempts with simple features
The latest edition of the Scottish Index of Multiple Deprivation (SIMD) was released last year, and has been getting a bit more promotion recently as part of Scotland’s open data – for example, an R package has been made available to calculate the SIMD rankings.
The purpose of the SIMD is that it “identifies small area concentrations of multiple deprivation across all of Scotland in a consistent way. It allows effective targeting of policies and funding where the aim is to wholly or partly tackle or take account of area concentrations of multiple deprivation.
SIMD ranks small areas (called data zones) from most deprived (ranked 1) to least deprived (ranked 6,976). People using SIMD will often focus on the data zones below a certain rank, for example, the 5%, 10%, 15% or 20% most deprived data zones in Scotland.
SIMD provides a wealth of information to help improve the understanding about the outcomes and circumstances of people living in the most deprived areas in Scotland” ( from the main website linked above).
To put this into more practical terms, each Local Authority area in Scotland is divided into DataZones, with a roughly equal poopulation in each. Each zone is ranked (low to high, higher is less deprived) on a variety of metrics covering health, education, access to services, housing etc. Various spreadshees, database files and shape files are now available for analysis.
There are already some great examples of mapping online, including a fantastic interactive map, but I wanted to try mapping it myself using the “sf” package.
If you want to follow along, a couple of things you need to bear in mind:
- You’ll need to download shapefiles from this link on the SIMD page
- You’ll need to install the dev version of ggplot2 in order to benefit from geom_sf().
This will involve devtools, and on Windows, RTools in order to download and install the package successfully.
First install the devtools package from your usual package repository, and then use “devtools::install_github(“tidyverse/ggplot2”)” to install the development version.
library(sf)
library(dplyr)
devtools::install_github("tidyverse/ggplot2")
library(stringr)
library(viridis)
library(hrbrthemes)# optional - but awesome and highly recommended
library(extrafont)# optional, might be required if you are a windows user and want to use hrbrthemes
You’ll see I’m using hrbrthemes – which is very, very nice indeed. You are free to use other themes of course, but if you haven’t at least tried hrbrthemes then do yourself a favour. This was the first time I’d used it and it’s definitely my default for the foreseeable future.
I’m also using extrafont – for Windows users, you may find errors relating to missing fonts when trying hrbrthemes and other custom themes without this packaged being installed. It makes it easier to add additional fonts to your system. Install the package, this will install “extrafontdb” – and follow the help to register the fonts on your system – this takes a while initially but worth doing (it’s just one command – font_import() -so not too onerous).
Right – to the actual mapping.
First, read in the shape file:
scot <- st_read("SG_SIMD_2016.shp")
This results in a dataframe and sf object incorporating 6976 observations and 50 variables.
Then, for simplicity, I’m going to change all the variables to lowercase, and create a smaller dataframe for all the observations relating to the Highland region ( by filtering on ‘laname’):
colnames(scot) <- colnames(scot) %>% str_to_lower()
highland <- filter(scot, laname == "Highland")
The code below will give a map of the Highland region coloured by quintile ( that is, a score of 1-5, with 1 indicating the most deprived areas and 5 the least):
ggplot(highland) +
geom_sf(aes(fill = quintile)) +
scale_fill_viridis("quintile",option = "C",
guide = guide_legend(title = "Quintile")) +
ggtitle("SIMD 2016 - Highland Council Area by Quintile",
subtitle = "1 = most deprived, 5 = least deprived") +
theme_ipsum(base_size = 10) +
theme(plot.title = element_text(hjust = 0))
Here is the resulting plot:
There are quite a few interesting variables within the dataset. The SIMD scores are available by quintile, decile and vigintile, plus overall all rankings by domains such as health and education, and the main SIMD ranking where 1 is the most deprived and 6976 is the least deprived datazone (in Renfrewshire, and East Renfrewshire respectively, if you’re interested).
Here are some of the other plots I produced - the code being remarkably similar to the above example in all cases:
Scotland by quintile:
ggplot(scot) +
geom_sf(aes(fill = quintile)) +
scale_fill_viridis("quintile",option = "C") +
ggtitle("SIMD 2016 - Scotland by Quintile",
subtitle = "1 = most deprived, 5 = least deprived") +
theme_ipsum(base_size = 10)
The problem with this plot is that quite a large chunk of the central belt does not appear to have rendered correctly - too much absence of colour.
So let’s have a look at Edinburgh and Glasgow, the 2 main cities, using a different viridis palette:
I mentioned that the area around Renfrew had the lowest and highest ranked datazones, so let’s take a look at them:
renfrew <- filter(scot,laname %in% c("East Renfrewshire","Renfrewshire"))
ggplot(renfrew) +
geom_sf(aes(fill = quintile)) +
scale_fill_viridis("quintile",option = "C",
guide = guide_legend(title = "Quintile")) +
ggtitle("SIMD 2016 - Renfrewshire & East Renfrewshire by Quintile",
subtitle = "1 = most deprived, 5 = least deprived") +
theme_ipsum(base_size = 10) +
facet_wrap(~laname)
Here is the unfaceted version:
A focus on the highlands now. Amongst the many variables are some relating to the availability of central heating within homes, and average drive times to access local services, such as retail, GP’s, and petrol stations:
I’d imagine a lot of older houses in the Highlands still have open fireplaces providing plenty heat and water, but which may not qualify as “central heating” compared to a conventional boiler / oil fired system.
Again, I can see this being quite a meaningful metric in more densely populated area, but this is less meaningful in Highlands. You don’t just get on the road and drive in a straight line to your destination, we have hills, moors, and scenic winding roads that you can’t drive very quickly on. And of course there are fewer filling stations generally in the less populated areas.
I’ve also plotted other areas within Scotland and these results are not out of the ordinary -all the info is available to you within the dataframe, so you can try other areas to check.
Finally, the overall highland rankings - remember higher is better:
Main takeaway - the ‘sf’ package makes mapping pretty easy - if you can get your hands on a shapefile then you should be able to import it and plot it without too much difficulty, and the dev version of ggplot2 makes it even simpler.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.