Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I wanted to draw my own nice, clean map of the Pacific Island countries and territories that I work with in my day job; and a workflow I could use for producing statistical graphics with it, particularly choropleth maps. I am going to build this into my frs
R package to make it easier to re-use, but today’s blog is my prototype putting it together from first principles.
First, here’s the end product. As sample data I’ve used population per square kilometre of the exclusive economic zone, which is a slightly unusual metric that I was interested in at the time.
So here’s how I made that. There was quite a bit involved but having worked it out I will package it up nicely (future post) so it can be easy to do for future.
Land masses
To start with I need a simplified set of polygons showing where the Earth is land rather than sea. In my polished map the land is going to be coloured pale grey, nearly white, in the background. This would be easy if I wanted to centre my map on the Atlantic ocean, but because I’m at the other end of the Earth I need to centre the map somewhere in the Pacific. This comes up against the well-known anti-meridian mapping problem – the annoying glitch where many published spatial datasets representing the Earth have a problem with polygons that cross 180 degrees of longitude, causing all sorts of ugliness. This problem is so common that it clutters up Google searches for anything to do with how to draw maps centred in the Pacific.
There are of course many ways of dealing with this, but I used this nice method from a Stackoverflow answer, which basically creates data frame of all the points to connect to draw the world, and does it twice (with the second set having 360 added to all the longitude values), before filtering down to the area we actually want which will include some values at their original longitude (between -180 and 190) and some that have had 360 added to them – so the final longitudes after filtering are all between 0 and 360. Thinking about this as though a map is a scatter plot (which basically is indeed the case), with the 0 degree line going through Greenwich in the UK, having longitudes of 0 to 360 means we can have the UK on the far left of the plot area where we want it, whereas having them from -180 to 190 means UK has to be in the centre.
Here’s the code that sets everything up for the session and gets going on that map:
That looks like this, which is pretty much perfect for my purposes:
International Date Line
Next, you may have noticed my target map has the International Date Line drawn through it. In our part of the world, this is important! Areas on the right side of that line are a day behind those on the left e.g. while I am sitting in my office in Noumea on a Monday, if I pick up the phone to talk to someone in Cook Islands, it will be Sunday there.
The Date Line is not a simple straight line because of the understandable desire of some countries not be split in two by it. Most obviously, Kiribati has arranged for the dateline to leap out to the east to include the area around Kiritimati atoll so they can be in the same day as the majority of the people on Tarawa, in the west.
The exact location of the international dateline and how to draw it is one of those things that’s difficult to google because of clutter from the polygons-split-by-the-antimeridian problem referenced above, but eventually I tracked down an easy version to use, in “Natural Earth”. Natural Earth is a fantastic resource, providing free vector and raster map data at a range of scales. The rnaturalearth
package makes it super easy to download so long as you know exactly what you are looking for. This snippet of code does the job for the Date Line:
You’ll see I’ve wrapped this, and some other bits of code in this with an if(!exists...
statement so the downloading only happens the first time in an R session, saving valuable bits of download bandwidth and some time when I hit the “source” button to run my whole script.
Polygons of the exclusive economic zones
Now it gets a bit more complicated. I need data on the actual shapes of the world’s exclusive economic zones, including the Pacific Island Countries and Territories I want to show. This is available from the Pacific Data Hub, in several formats including KML or Keyhole Markup Language, a useful XML variant for geographic data used by the likes of Google Earth.
In this next chunk of code I:
- download the EEZ polygons as a KML file and read them into R as a “simple features” object (if you don’t know what simple features is and how it has changed the world of geocomputation for the much, much better then Google it and be amazed),
- shift its longitude so it will be centred on the Pacific (using the useful
st_shift_longitude()
function from thesf
R package, that function being built for just this purpose), - knock it down to just the countries and territories of interest in the Pacific (which I established as polygons numbered 67 and 245 to 282 by inspecting it visually)
- cut it down further to exclude a few bits and pieces like Hawaii, Australia and Wake Island that I decided while might be useful one day, not for today.
- tidy up the names from reading (for example) “Fijian Exclusive Economic Zone” to be “Fiji”. The code that does this using regex and some ad hoc fixes is a bit clunky, but works.
- does a clever (at least I think so)
group_by
andsummarise
trick to merge together a few shapes which are otherwise separated by our old friend, the antimeridian which would show up as a white line in our map without this fix. - joins the final data frame to the official ISO 3166 country codes so I can use it later in combination with the statistical data from the Pacific Data Hub
Population and EEZ area
So we’ve done the hardest spatial stuff, and now we can start thinking about colouring in the EEZs and adding nice labels of country names.
I need to extract the centroids of all of my polygons, so I’ve got x and y coordinates to place the labels on the map. In the code below this object is called pac_c
.
Then I got the data on latest population and the area (as a number, in square kilometres) of the exlusive economic zones from the “POCKET” (as in “pocket summary”) data flow in the “.Stat” indicator database of the Pacific Data Hub, PDH.Stat, which is maintained by my team at work. I wrote a bit about this data source in my last post. The rsdmx
R package gives easy access to the data from this, and other similar .Stat tools around the world that use the SDMX standard for disseminating statistical indicators.
In the code below I download that data, pick the indicators I want and pivot it wider into a format with one row per country or territory called pop_pdh
, and calculate the number of people per thousand square kilometres of EEZ. Then I take the simple features data object of EEZ zones (pac
), join it to the coordinates of the polygons centres (pac_c
), join that to the statistical data in pop_pdh
, and I have a straightforward data frame complete with simple features polygon of all the data I want to represent with my map.
Finally (in this chunk), I calculate the upper and lower limit of seven categories I am going to use for setting the actual colour used on the map, and I store these for use in the object quantiles
.
How much of the world is in the Pacific?
There are a few calculated numbers I wanted to include in my subtitles:
- the proportion of the world’s surface that is Pacific Island EEZs
- the proportion of the world’s surface that is Pacific Island land area
- the proportion of the world’s population that lives in Pacific Island Countries and territories
These are all interesting numbers that I think should be more widely known! So here’s the calculation of them, and storing them in the objects prop_surface
, prop_land
and prop_pop
respectively. I had to calculate the land area of the countries by dividing the population by the population density, as I don’t believe the land area (as opposed to the EEZ area) is in the “Pocket” summary data I downloaded from the PDH.
Let’s draw a map
OK, finally we are ready to draw our map. All the statistical calculations are done apart from the final calculation of which colour to shade each polygon, which is done in the code below just before the ggplot
statement. All the rest of the code is polish and annotations of the map. A few things to note:
- the coloured EEZ polygons are drawn first, and because they are a simple features object they are drawn with the
geom_sf
geom - the pale grey landmasses that perceptually are in the background are drawn second and are in fact semi-transparent white. Because these are not simple features but a straight data frame (remember we created this earlier as the very first step, a data frame of polygons of land masses centred on the Pacific) these land forms are drawn with
geom_polygon
. Having the white land mass drawn subsequently (so on top of) the coloured EEZ polygons was counter-intuitive to me but gives a better visual effect given how little land there is in the area of interest. - The international date line is a simple features object (albeit just a line) so it is drawn with
geom_sf
Feedback is always appreciated! And to be clear, although I am in fact responsible for one of the key concentrations of experts on Pacific Island statistics, my actual job doesn’t involve making maps and I am blogging very much in my personal capacity. None of the experts who work with me have checked this work. So all errors are my personal fault, not that of the institution I work for!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.