Site icon R-bloggers

Visualising Italian Wine Production

[This article was first published on RBlog – Mango Solutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In case you hadn’t spotted it already, a major new release of ggplot2 hit CRAN last week. Among the many new features, the one that caught my attention was support for simple features. Or in other words, it’s now really easy to plot spatial data in ggplot2.

On Friday afternoon I spent a couple of hours playing with this new functionality as well as mapview, a package I hadn’t come across before but discovered on Friday. This blog is a quick walkthrough to show you how simple spatial visualisation in R really is.

In case you want to follow along, the full list of packages used were:

library(tidyverse)
library(ggplot2)
library(sf)
library(mapview)
library(rvest)
library(magrittr)

The Data

Whenever I start to play with new functions I always try to use my own data. In this case, thanks to a recent project, I happened to have a shape file for Italy to hand (You can download it for yourself from Istat).

Initially, I just plotted the region level data with random colours (as you may have already seen on Twitter) but to reflect my usual workflow I found some data on the amount of wine produced by region – it was Friday afternoon after all!

I won’t go into all the detail of the web scraping and data manipulation, but the code is included throughout so that you can replicate this example.

wine <- read_html("https://italianwinecentral.com/wine-production-in-italy-by-region/")
wine %<>% 
  html_table() %>% 
  extract2(1) %>%
  slice(-21)
wine[,-1] %<>% map_df(parse_number)

This gave me an interesting dataset to play with that wasn’t actually too dissimilar to what I often end up trying to plot i.e. change over time.

head(wine)
Region
<chr>
2012
<dbl>
2013
<dbl>
2014
<dbl>
2015
<dbl>
2016
<dbl>
1 Abruzzo 2443 2728 2273 2936 2937
2 Basilicata 189 178 102 87 93
3 Calabria 400 370 314 404 391
4 Campania 1542 1644 1183 1614 1286
5 Emilia Romagna 6273 7396 6958 6752 7039
6 Friuli-Venezia Giulia 1281 1073 1367 1872 1856
6 rows

Importing Shape Files

Now for the interesting part.

To import the spatial data I used the sf package. This package makes reading the data very simple and stores it in a way that is much easier to reason about, especially for data that is intended to provide regions.

To import the shape file we can use the st_read function, in this case pointing to the file location, although you can also point to a database. Once the data is imported we have a single row for each region. The data includes a column geometry. This is where the information related to the grouping is defined. In this case the geometry stores (as a list-column) all of the information relating to polygon of the region. But, this could also be a single point or a whole host of other information. Check out the vignettes for more details.

italy <- st_read("./Limiti_2016_ED50_g/Reg2016_ED50_g/Reg2016_ED50_g.shp")
head(italy)
COD_REG
<int>
REGIONE
<fctr>
SHAPE_Leng
<dbl>
SHAPE_Area
<dbl>
geometry
<S3: sfc_MULTIPOLYGON>
1 1 Piemonte 1235685.5 25394259406 <S3: sfc_MULTIPOLYGON>
2 2 Valle D’Aosta 311141.5 3258954558 <S3: sfc_MULTIPOLYGON>
3 3 Lombardia 1410619.8 23862504702 <S3: sfc_MULTIPOLYGON>
4 5 Veneto 1058657.7 18406022144 <S3: sfc_MULTIPOLYGON>
5 4 Trentino-Alto Adige 800899.4 13607742436 <S3: sfc_MULTIPOLYGON>
6 7 Liguria 825474.4 5415107538 <S3: sfc_MULTIPOLYGON>
6 rows

As this is just a simple data frame, it can be manipulated in the same way as any other dataframe. In this case that includes a little change to the region names to make them match exactly and joining the data so I have everything I want in one data set, ready to start plotting.

levels(italy$REGIONE) <- wine$Region
WineLocation <- full_join(italy, wine, by = c("REGIONE" = "Region"))

New Features of ggplot2

Now for the new features of ggplot2. There are a lot of them this time around. Many related to the specification of variables for programming purposes but I was most interested in how easy it was to plot my Italy data. And it turns out that the answer was extremely.

Point to the data, tell it which column to use as the fill and add the new geom_sf layer. Just to try it out, I have added the new viridis scale layer. But that is all I needed to do. All of the projections have been managed for me, the colours look much more impressive than the defaults and all of the axis labels and titles have been handled the way I would want them, everything has just worked without me having to do any thinking.

ggplot(data = WineLocation, aes(fill = `2016`)) + 
  geom_sf() + 
  scale_fill_viridis_c()

Taking this even further, I happened to pick a data set with multiple years in it. A little manipulation to get the data into a better format to plot and an additional line in my plotting code and I have facetted maps. Again, something I didn’t need to do but I have tried out the new vars specification for facetting. Don’t worry, the formula interface is still there, and this is one of the features that will make programming with ggplot2 much easier. But this alternative is certainly very simple and may even be easier for some ggplot2 novices.

wineLong <- gather(wine, Year, Quantity, -Region)
fullWineLong <- full_join(italy, wineLong, by = c("REGIONE" = "Region"))

ggplot(data = fullWineLong, aes(fill = Quantity)) + 
  geom_sf() + 
  facet_wrap(vars(Year)) +
  scale_fill_viridis_c() 

I tried this with all the shape files I could get my hands on and I am pleased to report it was just as easy. As it’s ggplot2 I was able to add layers as I wanted them, including additional shape files and all the projections were handled for me – although plotting the US and Italy on one map did look a little odd by default!

A quick look at mapview

Whilst I was reviewing the vignettes for sf I came across an example of using mapview. Despite this package having been on CRAN for a few years I had never come across it before, but I am very glad I have now.

In a nutshell, it is a wrapper to leaflet. The good thing is that it works very easily with simple features data. There are a huge number of options but to recreate an interactive version of my first map I simply needed to define the data, the column that defines the colouring (zcol) and as an extra the data that gives labels when I hover on the map, in this case the region names.

fullWineLong %>% 
  filter(Year == 2016) %>%
  mapview(zcol = "Quantity", label = .$REGIONE)

You can take a look at the map created on RPubs.

There are a huge number of options to mapview and I certainly haven’t tried them all out yet, but some of the notable features include:

  • Adding additional shape files by simply adding two mapview objects together
  • Adding leaflet layers, using the usual leaflet functions
  • Including multiple layers of data
  • Defining the pop-up functionality

This last one is particularly interesting as you can not only pop-up tables of data but also graphics. As an example, suppose that we want to be able to click on the region of interest and see the change in wine production over time. All we have to do is create a list object that includes all of the graphics that we need (in the same order as the data) and pass this to the popup option using the popupGraph function.

plotByRegion <- function(region){
  fullWineLong %>%
    filter(REGIONE == region) %>%
  ggplot() +
    geom_line(aes(Year, Quantity, group = REGIONE)) +
    labs(title = region)
}

p <- lapply(WineLocation$REGIONE, plotByRegion)

fullWineLong %>% 
  filter(Year == 2016) %>%
  mapview(zcol = "Quantity", label = .$REGIONE, 
          popup = popupGraph(p))

The documentation for mapview does state that it is aimed at quick visualisation for exploratory purposes rather than for production-ready graphics and it really suited that purpose for me. Personally, I can see my usage becoming a combination of both mapview and leaflet for interactive graphics and ggplot2 for static.

Right now I am very happy that I have an incredibly quick and easy workflow for getting my spatial data into R and visualising it without having to worry about map projections and little details that distract me from analysis.

To leave a comment for the author, please follow the link and comment on their blog: RBlog – Mango Solutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.