Visualising Italian Wine Production
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In case you hadn’t spotted it already, a major new release of ggplot2 hit CRAN last week. Among the many new features, the one that caught my attention was support for simple features. Or in other words, it’s now really easy to plot spatial data in ggplot2.
On Friday afternoon I spent a couple of hours playing with this new functionality as well as mapview, a package I hadn’t come across before but discovered on Friday. This blog is a quick walkthrough to show you how simple spatial visualisation in R really is.
In case you want to follow along, the full list of packages used were:
library(tidyverse) library(ggplot2) library(sf) library(mapview) library(rvest) library(magrittr)
The Data
Whenever I start to play with new functions I always try to use my own data. In this case, thanks to a recent project, I happened to have a shape file for Italy to hand (You can download it for yourself from Istat).
Initially, I just plotted the region level data with random colours (as you may have already seen on Twitter) but to reflect my usual workflow I found some data on the amount of wine produced by region – it was Friday afternoon after all!
I won’t go into all the detail of the web scraping and data manipulation, but the code is included throughout so that you can replicate this example.
wine <- read_html("https://italianwinecentral.com/wine-production-in-italy-by-region/") wine %<>% html_table() %>% extract2(1) %>% slice(-21) wine[,-1] %<>% map_df(parse_number)
This gave me an interesting dataset to play with that wasn’t actually too dissimilar to what I often end up trying to plot i.e. change over time.
head(wine)
Region
|
2012
|
2013
|
2014
|
2015
|
2016
|
|
---|---|---|---|---|---|---|
1 | Abruzzo | 2443 | 2728 | 2273 | 2936 | 2937 |
2 | Basilicata | 189 | 178 | 102 | 87 | 93 |
3 | Calabria | 400 | 370 | 314 | 404 | 391 |
4 | Campania | 1542 | 1644 | 1183 | 1614 | 1286 |
5 | Emilia Romagna | 6273 | 7396 | 6958 | 6752 | 7039 |
6 | Friuli-Venezia Giulia | 1281 | 1073 | 1367 | 1872 | 1856 |
Importing Shape Files
Now for the interesting part.
To import the spatial data I used the sf package. This package makes reading the data very simple and stores it in a way that is much easier to reason about, especially for data that is intended to provide regions.
To import the shape file we can use the st_read
function, in this case pointing to the file location, although you can also point to a database. Once the data is imported we have a single row for each region. The data includes a column geometry
. This is where the information related to the grouping is defined. In this case the geometry stores (as a list-column) all of the information relating to polygon of the region. But, this could also be a single point or a whole host of other information. Check out the vignettes for more details.
italy <- st_read("./Limiti_2016_ED50_g/Reg2016_ED50_g/Reg2016_ED50_g.shp") head(italy)
COD_REG
|
REGIONE
|
SHAPE_Leng
|
SHAPE_Area
|
geometry
|
|
---|---|---|---|---|---|
1 | 1 | Piemonte | 1235685.5 | 25394259406 | |
2 | 2 | Valle D’Aosta | 311141.5 | 3258954558 | |
3 | 3 | Lombardia | 1410619.8 | 23862504702 | |
4 | 5 | Veneto | 1058657.7 | 18406022144 | |
5 | 4 | Trentino-Alto Adige | 800899.4 | 13607742436 | |
6 | 7 | Liguria | 825474.4 | 5415107538 |
As this is just a simple data frame, it can be manipulated in the same way as any other dataframe. In this case that includes a little change to the region names to make them match exactly and joining the data so I have everything I want in one data set, ready to start plotting.
levels(italy$REGIONE) <- wine$Region WineLocation <- full_join(italy, wine, by = c("REGIONE" = "Region"))
New Features of ggplot2
Now for the new features of ggplot2. There are a lot of them this time around. Many related to the specification of variables for programming purposes but I was most interested in how easy it was to plot my Italy data. And it turns out that the answer was extremely.
Point to the data, tell it which column to use as the fill and add the new geom_sf
layer. Just to try it out, I have added the new viridis scale layer. But that is all I needed to do. All of the projections have been managed for me, the colours look much more impressive than the defaults and all of the axis labels and titles have been handled the way I would want them, everything has just worked without me having to do any thinking.
ggplot(data = WineLocation, aes(fill = `2016`)) + geom_sf() + scale_fill_viridis_c()
Taking this even further, I happened to pick a data set with multiple years in it. A little manipulation to get the data into a better format to plot and an additional line in my plotting code and I have facetted maps. Again, something I didn’t need to do but I have tried out the new vars
specification for facetting. Don’t worry, the formula interface is still there, and this is one of the features that will make programming with ggplot2 much easier. But this alternative is certainly very simple and may even be easier for some ggplot2 novices.
wineLong <- gather(wine, Year, Quantity, -Region) fullWineLong <- full_join(italy, wineLong, by = c("REGIONE" = "Region")) ggplot(data = fullWineLong, aes(fill = Quantity)) + geom_sf() + facet_wrap(vars(Year)) + scale_fill_viridis_c()
I tried this with all the shape files I could get my hands on and I am pleased to report it was just as easy. As it’s ggplot2 I was able to add layers as I wanted them, including additional shape files and all the projections were handled for me – although plotting the US and Italy on one map did look a little odd by default!
A quick look at mapview
Whilst I was reviewing the vignettes for sf I came across an example of using mapview. Despite this package having been on CRAN for a few years I had never come across it before, but I am very glad I have now.
In a nutshell, it is a wrapper to leaflet. The good thing is that it works very easily with simple features data. There are a huge number of options but to recreate an interactive version of my first map I simply needed to define the data, the column that defines the colouring (zcol
) and as an extra the data that gives labels when I hover on the map, in this case the region names.
fullWineLong %>% filter(Year == 2016) %>% mapview(zcol = "Quantity", label = .$REGIONE)
You can take a look at the map created on RPubs.
There are a huge number of options to mapview
and I certainly haven’t tried them all out yet, but some of the notable features include:
- Adding additional shape files by simply adding two mapview objects together
- Adding leaflet layers, using the usual leaflet functions
- Including multiple layers of data
- Defining the pop-up functionality
This last one is particularly interesting as you can not only pop-up tables of data but also graphics. As an example, suppose that we want to be able to click on the region of interest and see the change in wine production over time. All we have to do is create a list object that includes all of the graphics that we need (in the same order as the data) and pass this to the popup
option using the popupGraph
function.
plotByRegion <- function(region){ fullWineLong %>% filter(REGIONE == region) %>% ggplot() + geom_line(aes(Year, Quantity, group = REGIONE)) + labs(title = region) } p <- lapply(WineLocation$REGIONE, plotByRegion) fullWineLong %>% filter(Year == 2016) %>% mapview(zcol = "Quantity", label = .$REGIONE, popup = popupGraph(p))
The documentation for mapview does state that it is aimed at quick visualisation for exploratory purposes rather than for production-ready graphics and it really suited that purpose for me. Personally, I can see my usage becoming a combination of both mapview and leaflet for interactive graphics and ggplot2 for static.
Right now I am very happy that I have an incredibly quick and easy workflow for getting my spatial data into R and visualising it without having to worry about map projections and little details that distract me from analysis.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.