Clipping polygons in R
Clipping polygons is a basic GIS task. It involves removing unneeded polygons outside of an area of interest. For example, you might want to study all local authorities (LADs) in the Yorkshire and the Humber region but can only obtain shapefiles that contain all the LADs in the UK. Removing the LADs outside of Yorkshire and the Humber could be achieved by ‘clipping’ the shapefile of LADs, using the extent of the larger region as a template. In R this takes a bit of work, but is quite possible with a few lines of code and has the advantage of being fully reproducible.
Getting started
Let’s start by obtaining administrative boundary shapefiles in England, loading necessary packages, and quickly checking the shapefiles are plotting correctly. We will need both regions and LADs. I’m using the freely-available boundaries from the UK Data Service:
# The default behaviour of this script is to create a folder called 'clip-example' # and download and run everything from here. # You can change this by modifying the next two lines dir.create("clip-example") setwd("clip-example") # download and unzip UK administrative boundary shapefiles download.file("https://census.edina.ac.uk/ukborders/easy_download/prebuilt/shape/England_gor_2011.zip", destfile = "gor.zip") # ~6.4MB download.file("https://census.edina.ac.uk/ukborders/easy_download/prebuilt/shape/England_lad_2011.zip", destfile = "lad.zip") # ~25MB unzip("gor.zip", exdir = ".") # '.' just means current directory unzip("lad.zip", exdir = ".") # I recommend loading the shapefiles with readOGR() in the rgdal package # install.packages("rgdal") # uncomment and run if not already installed # install.packages("rgeos") # as above # If you run in to trouble installing rgdal and rgeos on Linux (Ubuntu) see: # https://philmikejones.wordpress.com/2014/07/14/installing-rgdal-in-r-on-linux/ require("rgdal") require("rgeos") regions <- readOGR(".", "england_gor_2011") plot(regions) # worth a quick check everything looks like it should! lads <- readOGR(".", "england_lad_2011") plot(lads)
So now we have everything we need to begin: a shapefile of English regions, and a shapefile of English LADs.
Extract Yorkshire and the Humber
You’ve hopefully seen from your test plot that the shapefiles are plotting all boundaries in England, so first let’s remove all but Yorkshire and the Humber from the regions shapefile. We can do this by looking for the names of the regions in the shapefiles attributes file and using standard subsetting syntax. If you’ve used other GIS software before hopefully this idea will be familiar to you, but if not, don’t worry!
regions <- regions[regions@data$name == "Yorkshire and The Humber", ] # Another plot confirms just the Yorkshire and The Humber region is left plot(regions)
Clip the LADs
The regions object now just contains Yorkshire and The Humber, so we can use this as our bounds to extract LADs:
yh_lads <- gIntersection(regions, lads, byid = TRUE, drop_lower_td = TRUE) plot(yh_lads)
Easy, but we’re not finished yet. The clipped polygons no longer contain a data frame because the gIntersection doesn’t (and can’t) know which data frame items to save in to the new object. This means we must add them back in manually, but even this is relatively straight-forward.
Recreate the data frame
The row names for the clipped polygons are a concatenation of the regions row name and the LAD row names. These are in the form x yyyy where x is the region row name (in our case always 5, because Yorkshire and The Humber was row 5) and yyyy is the LAD row name. This can be confirmed with:
row.names(yh_lads) ## [1] "5 2" "5 8" "5 9" "5 37" "5 39" "5 43" "5 105" "5 124" "5 162" "5 165" ## [11] "5 175" "5 204" "5 207" "5 214" "5 225" "5 245" "5 248" "5 251" "5 254" "5 297" ## [21] "5 320"
So that we can create a data frame based on the LAD data frame, we must first ensure our row names match the LAD row names so they can be merged across. I do this by removing the “5 ” with gsub():
row.names(yh_lads) <- gsub("5 ", "", row.names(yh_lads)) # Confirm it worked row.names(yh_lads) ## [1] "2" "8" "9" "37" "39" "43" "105" "124" "162" "165" "175" "204" "207" ## [14] "214" "225" "245" "248" "251" "254" "297" "320"
So far, so good. Now we create a variable – “keep” – that contains the row names that we want to keep from the LADs data frame.
keep <- row.names(yh_lads)
For the sake of simplicity I also change the polygon IDs so they are consistent with their respective row names. This step is probably optional, but I prefer to ensure they match so I have a consistent set of IDs to work with later on. If you choose not to do this I don’t think you’ll run into any problems but YMMV.
yh_lads <- spChFIDs(yh_lads, keep)
Now we create a copy of the LAD dataframe and filter it so that it only contains the rows we want to keep, using row names to perform the match:
yh_data <- as.data.frame(lads@data[keep, ])
Finally we create a SpatialPolygonsDataFrame object with our clipped polygons and our subsetted data frame:
yh_lads <- SpatialPolygonsDataFrame(yh_lads, yh_data)
All being well, your final plot should look like this:
plot(yh_lads)
And your yh_lads object should contain attribute data, which can be viewed with:
yh_lads@data ## CODE NAME ALTNAME ## 2 E07000164 Hambleton ## 8 E07000169 Selby ## 9 E08000017 Doncaster ## 37 E07000168 Scarborough ## 39 E08000032 Bradford ## 43 E07000165 Harrogate ## 105 E07000167 Ryedale ## 124 E07000166 Richmondshire ## 162 E06000012 North East Lincolnshire ## 165 E08000035 Leeds ## 175 E07000163 Craven ## 204 E08000033 Calderdale ## 207 E06000013 North Lincolnshire ## 214 E08000019 Sheffield ## 225 E06000010 Kingston upon Hull, City of ## 245 E06000011 East Riding of Yorkshire ## 248 E06000014 York ## 251 E08000036 Wakefield ## 254 E08000034 Kirklees ## 297 E08000016 Barnsley ## 320 E08000018 Rotherham # If further confirmation is required: class(yh_lads) ## [1] "SpatialPolygonsDataFrame" ## attr(,"package") ## [1] "sp"
Congratulations, a fully reproducible polygon clip!