Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The commute to my workplace is 90 minutes each way. Podcasts are my friend. I’m a long-time listener of In Our Time and enjoyed the recent episode about The Danelaw.
Melvyn and I hail from the same part of the world, and I learned as a child that many of the local place names there were derived from Old Norse or Danish. Notably: places ending in -by denote a farmstead, settlement or village; those ending in -thwaite mean a clearing or meadow.
So how local are those names? Time for some quick and dirty maps using R.
First, we’ll need a dataset of British place names. There are quite a few of these online, but top of my Google search was Index of Place Names in Great Britain (July 2016). It comes in several formats including CSV, easy to read into R like so:
library(tidyverse) library(maps) gbplaces <- read_csv("https://opendata.arcgis.com/datasets/a6c138d17ac54532b0ca8ee693922f10_0.csv?outSR=%7B%22latestWkid%22%3A27700%2C%22wkid%22%3A27700%7D")
A quick inspection of the data reveals that whilst there is a unique identifier, objectid_1
, each row is not as such a unique place (the dataset is based on grid locations). We can reduce the number of rows a little by taking distinct(placesort, lat, long_)
, but that will still retain duplicate place names with slightly different coordinates. For our purposes, it doesn’t really matter – we just want an indication of distribution, rather than a highly-accurate map.
We’ll start by looking at places ending in -by. For this example, we’ll let the points themselves define the outline of Great Britain rather than drawing one. We’ll emphasise the -by places and try to de-emphasise the rest.
gbplaces %>% distinct(placesort, lat, long_) %>% mutate(isBy = ifelse(grepl("^.+by$", placesort), TRUE, FALSE)) %>% # not the territories! filter(lat > 40) %>% ggplot(aes(long_, lat)) + geom_point(aes(color = isBy, alpha = isBy), size = 0.5) + scale_colour_viridis_d(direction = -1, name = "ends in -by", option = "inferno") + scale_alpha_manual(values = c(0.3, 1)) + theme(axis.title = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), panel.grid = element_blank(), panel.border = element_blank()) + labs(title = "Distribution of GB place names ending -by") + guides(alpha = FALSE) + coord_map()
Here’s the result – click for a larger version. Not bad. Lots of locations in Cumbria and eastern England. I like how the “plotting by points only” approach emphasises the empty mountainous regions in Scotland, Northern England and Wales.
Now we’ll look at -thwaite. This time we’ll use map_data()
to pull an outline from the maps package.
# filter out N Ireland ggplot(data = map_data("world", "UK") %>% filter(group != 3), aes(x = long, y = lat)) + geom_polygon(aes(group = group), fill = "darkolivegreen") + coord_map() + geom_point(data = gbplaces %>% filter(grepl("^.+thwaite$", placesort), lat > 40), aes(long_, lat), color = "yellow", size = 0.5) + theme(axis.title = element_blank(), axis.text = element_blank(), axis.ticks = element_blank(), panel.grid = element_blank(), panel.border = element_blank()) + labs(title = "Distribution of GB place names ending -thwaite")
Result below. We see that -thwaite is much more localised to Cumbria and parts of Yorkshire.
Summary
I find mapping languages quite fascinating, but of course it’s not an original idea. Here’s an interactive map of Norse-derived place names in the UK, developed for an exhibition at the British Museum. I’m sure there are many others.
If you want to put data on a map, R offers many options using base R, ggplot2 or interactive Javascript such as Leaflet. I think it’s never been quicker or easier to do.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.