Building Interactive World Maps in Shiny

R Views

2 years ago

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Florianne Verkroost is a PhD candidate at Nuffield College at the University of Oxford. With a passion for data science and a background in mathematics and econometrics. She applies her interdisciplinary knowledge to computationally address societal problems of inequality.

In this post, I will show you how to create interactive world maps and how to show these in the form of an R Shiny app. As the Shiny app cannot be embedded into this blog, I will direct you to the live app and show you in this post on my GitHub how to embed a Shiny app in your R Markdown files, which is a really cool and innovative way of preparing interactive documents. To show you how to adapt the interface of the app to the choices of the users, we’ll make use of two data sources such that the user can choose what data they want to explore, and that the app adapts the possible input choices to the users’ previous choices. The data sources here are about childlessness and gender inequality, which is the focus of my PhD research, in which I computationally analyse the effects of gender and parental status on socio-economic inequalities.

We’ll start by loading and cleaning the data, whereafter we will build our interactive world maps in R Shiny. Let’s first load the required packages into RStudio.

Importing, exploring and cleaning the data

Now, we can continue with loading our data. As we’ll make world maps, we need a way to map our data sets to geographical data containing coordinates (longitude and latitude). As different data sets have different formats for country names (e.g., “United Kingdom of Great Britain and Northern Ireland” versus “United Kingdom”), we’ll match country names to ISO3 codes to easily merge all data sets later on. Therefore, we first scrape an HTML table of country names, ISO3, ISO2 and UN codes for all countries worldwide. We use the rvest package using the XPath to indicate what part of the web page contains our table of interest. We use the pipe (%>%) from the magrittr package to feed our URL of interest into functions that read the HTML table using the XPath and convert that to a data frame in R. One can obtain the XPath by hovering over the HTML table in developer mode in the browser, and having it show the XPath.

The first element in the resulting list contains our table of interest, and as the first column is empty, we delete it. Also, as you can see from the HTML table in the link, there are some rows that show the letter of the alphabet before starting with a list of countries of which the name starts with that letter. As these rows contain the particular letter in all columns, we can delete these by deleting all rows for which all columns have equal values.

library(magrittr)
library(rvest)
url <- "https://www.nationsonline.org/oneworld/country_code_list.htm"
iso_codes <- url %>%
  read_html() %>%
  html_nodes(xpath = '//*[@id="CountryCode"]') %>%
  html_table()
iso_codes <- iso_codes[[1]][, -1]
iso_codes <- iso_codes[!apply(iso_codes, 1, function(x){all(x == x[1])}), ]
names(iso_codes) <- c("Country", "ISO2", "ISO3", "UN")
head(iso_codes)
##          Country ISO2 ISO3  UN
## 2    Afghanistan   AF  AFG 004
## 3  Aland Islands   AX  ALA 248
## 4        Albania   AL  ALB 008
## 5        Algeria   DZ  DZA 012
## 6 American Samoa   AS  ASM 016
## 7        Andorra   AD  AND 020

Next, we’ll collect our first data set, which is a data set on childlessness provided by the United Nations. We download the file from the link, save it locally, and then load it into RStudio using the read_excel() function in the readxl package.

library(readxl)
url <- "https://www.un.org/en/development/desa/population/publications/dataset/fertility/wfr2012/Data/Data_Sources/TABLE%20A.8.%20%20Percentage%20of%20childless%20women%20and%20women%20with%20parity%20three%20or%20higher.xlsx"
destfile <- "dataset_childlessness.xlsx"
download.file(url, destfile)
childlessness_data <- read_excel(destfile)
head(childlessness_data)
## # A tibble: 6 x 17
##   `United Nations… ...2  ...3  ...4  ...5  ...6  ...7  ...8  ...9  ...10
##   <chr>            <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 "TABLE  A.8. PE… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
## 2 Country          ISO … Peri… Refe… Perc… <NA>  <NA>  Perc… <NA>  <NA> 
## 3 <NA>             <NA>  <NA>  <NA>  35-39 40-44 45-49 35-39 40-44 45-49
## 4 Afghanistan      4     Earl… ..    ..    ..    ..    ..    ..    ..   
## 5 Afghanistan      4     Midd… ..    ..    ..    ..    ..    ..    ..   
## 6 Afghanistan      4     Late… 2010  2.6   2.6   2.1   93.8  94.5  94   
## # … with 7 more variables: ...11 <chr>, ...12 <chr>, ...13 <chr>,
## #   ...14 <chr>, ...15 <chr>, ...16 <chr>, ...17 <lgl>

We can see that the childlessness data are a bit messy, especially when it comes to the first couple of rows and column names. We only want to maintain the columns that have country names, periods, and childlessness estimates for different age groups, as well as the rows that refer to data for specific countries. The resulting data look much better. Note that when we convert the childlessness percentage columns to numeric type later on, the “..” values will automatically change to NA.

cols <- which(grepl("childless", childlessness_data[2, ]))
childlessness_data <- childlessness_data[-c(1:3), c(1, 3, cols:(cols + 2))]
names(childlessness_data) <- c("Country", "Period", "35-39", "40-44", "45-49")
head(childlessness_data)
## # A tibble: 6 x 5
##   Country     Period  `35-39` `40-44` `45-49`
##   <chr>       <chr>   <chr>   <chr>   <chr>  
## 1 Afghanistan Earlier ..      ..      ..     
## 2 Afghanistan Middle  ..      ..      ..     
## 3 Afghanistan Latest  2.6     2.6     2.1    
## 4 Albania     Earlier 7.2     5.5     5.2    
## 5 Albania     Middle  ..      ..      ..     
## 6 Albania     Latest  4.8     4.3     3.3

Our second data set is about measures of gender inequality, provided by the World Bank. We read this .csv file directly into RStudio from the URL link.

gender_index_data <- read.csv("https://s3.amazonaws.com/datascope-ast-datasets-nov29/datasets/743/data.csv")
head(gender_index_data)
##   Country.ISO3 Country.Name Indicator.Id
## 1          AGO       Angola        27959
## 2          AGO       Angola        27960
## 3          AGO       Angola        27961
## 4          AGO       Angola        27962
## 5          AGO       Angola        28158
## 6          AGO       Angola        28159
##                                                           Indicator
## 1                                   Overall Global Gender Gap Index
## 2                  Global Gender Gap Political Empowerment subindex
## 3                  Global Gender Gap Political Empowerment subindex
## 4                                   Overall Global Gender Gap Index
## 5 Global Gender Gap Economic Participation and Opportunity Subindex
## 6 Global Gender Gap Economic Participation and Opportunity Subindex
##   Subindicator.Type   X2006    X2007    X2008    X2009   X2010   X2011
## 1             Index  0.6038   0.6034   0.6032   0.6353  0.6712  0.6624
## 2              Rank 81.0000  92.0000 103.0000  36.0000 24.0000 24.0000
## 3             Index  0.0696   0.0696   0.0711   0.2007  0.2901  0.2898
## 4              Rank 96.0000 110.0000 114.0000 106.0000 81.0000 87.0000
## 5              Rank 69.0000  87.0000  87.0000  96.0000 76.0000 96.0000
## 6             Index  0.5872   0.5851   0.5843   0.5832  0.6296  0.5937
##   X2012   X2013    X2014   X2015   X2016   X2018
## 1    NA  0.6659   0.6311   0.637   0.643   0.633
## 2    NA 34.0000  38.0000  38.000  40.000  58.000
## 3    NA  0.2614   0.2402   0.251   0.251   0.206
## 4    NA 92.0000 121.0000 126.000 117.000 125.000
## 5    NA 92.0000 111.0000 116.000 120.000 113.000
## 6    NA  0.6163   0.5878   0.590   0.565   0.602

Luckily, these data are better structured than the childlessness data. The data contains gender inequality measures per year, and for convenience we add a new column with the values for the most recent year for which data are available. In this post, we’ll only look at the rank indicators rather than indices and normalized scores. We drop the Subindicator and IndicatorID columns using the select() function from the dplyr package, as we won’t need these further.

library(dplyr)
gender_index_data["RecentYear"] <- apply(gender_index_data, 1, function(x){as.numeric(x[max(which(!is.na(x)))])})
gender_index_data <- gender_index_data[gender_index_data$Subindicator.Type == "Rank", ] %>% 
  select(-Subindicator.Type, -Indicator.Id)
names(gender_index_data) <- c("ISO3", "Country", "Indicator", as.character(c(2006:2016, 2018)), "RecentYear")
head(gender_index_data)
##    ISO3 Country
## 2   AGO  Angola
## 4   AGO  Angola
## 5   AGO  Angola
## 7   AGO  Angola
## 9   AGO  Angola
## 11  AGO  Angola
##                                                                                           Indicator
## 2                                                  Global Gender Gap Political Empowerment subindex
## 4                                                                   Overall Global Gender Gap Index
## 5                                 Global Gender Gap Economic Participation and Opportunity Subindex
## 7                                                 Global Gender Gap Educational Attainment Subindex
## 9                                                    Global Gender Gap Health and Survival Subindex
## 11 Wage equality between women and men for similar work (survey data, normalized on a 0-to-1 scale)
##    2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2018 RecentYear
## 2    81   92  103   36   24   24   NA   34   38   38   40   58         58
## 4    96  110  114  106   81   87   NA   92  121  126  117  125        125
## 5    69   87   87   96   76   96   NA   92  111  116  120  113        113
## 7   107  119  122  127  125  126   NA  127  138  141  138  143        143
## 9     1    1    1    1    1    1   NA    1   61    1    1    1          1
## 11   NA   NA   NA   NA   NA   NA   NA   NA   NA   NA  135   94         94

Next, we load in our world data with geographical coordinates directly from the ggplot2 package. These data contain geographical coordinates of all countries worldwide, which we’ll later need to plot the worldmaps.

library(maps)
library(ggplot2)
world_data <- ggplot2::map_data('world')
world_data <- fortify(world_data)
head(world_data)
##     long   lat group order region subregion
## 1 -69.90 12.45     1     1  Aruba      <NA>
## 2 -69.90 12.42     1     2  Aruba      <NA>
## 3 -69.94 12.44     1     3  Aruba      <NA>
## 4 -70.00 12.50     1     4  Aruba      <NA>
## 5 -70.07 12.55     1     5  Aruba      <NA>
## 6 -70.05 12.60     1     6  Aruba      <NA>

To map our data, we need to merge the childlessness, gender gap index, and world map data. As I said before, these all have different notations for country names, which is why we’ll use the ISO3 codes. However, even between the ISO code data and the other data sets, there is discrepancy in country names. Unfortunately, to solve this, we need to manually change some country names in our data to match those in the ISO code data set. The code for doing so is long and tedious, so I won’t show that here, but for your reference you can find it here.

Now that the name changes for countries have been made, we can add the ISO3 codes to our childlessness and world map data. The gender gap index data already contain these codes, so there’s no need for us to add these there.

childlessness_data['ISO3'] <- iso_codes$ISO3[match(childlessness_data$Country, iso_codes$Country)]
world_data["ISO3"] <- iso_codes$ISO3[match(world_data$region, iso_codes$Country)]

Next, we melt the childlessness and gender gap index data into long format such that they will have similar shape and column names for merging. The melt() function is included in package reshape2. The goal here is to create variables that have different unique values for the different data, such that I can show you how to adapt the R Shiny app input to the users’ choices. For example, we’ll create a DataType column that has value Childlessness for the rows of the childlessness data and value Gender Gap Index for all rows of the gender gap index data. We’ll also create a column Period that contains earlier, middle and later periods for the childlessness data, and different years for the gender gap index data. As such, when the user chooses to explore the childlessness data, the input for the period will only contain the choices relevant to the childlessness data (i.e., earlier, middle, and later periods and no years). When the user chooses to explore the gender gap index data, they will only see different years as choices for the input of the period, and not earlier, middle, and later periods. The same goes for the Indicator column. This may sound slightly vague at this point, but we’ll see this in practice later on when building the R Shiny app.

library(reshape2)
childlessness_melt <- melt(childlessness_data, id = c("Country", "ISO3", "Period"), 
                           variable.name = "Indicator", value.name = "Value")
childlessness_melt$Value <- as.numeric(childlessness_melt$Value)
gender_index_melt <- melt(gender_index_data, id = c("ISO3", "Country", "Indicator"), 
                          variable.name = "Period", value.name = "Value")

After melting the data and ensuring they’re in the same format, we merge them together using the rbind() function, which we can do here because the data have the same column names.

childlessness_melt["DataType"] <- rep("Childlessness", nrow(childlessness_melt))
gender_index_melt["DataType"] <- rep("Gender Gap Index", nrow(gender_index_melt))
df <- rbind(childlessness_melt, gender_index_melt)

Creating an interactive world map

Next, it’s time to define the function that we’ll use for building our world maps. The inputs to this function are the merged data frame, the world data containing geographical coordinates, and the data type, period and indicator the user will select in the R Shiny app. We first define our own theme, my_theme(), for setting the aesthetics of the plot. Next, we select only the data that the user has selected to view, resulting in plotdf. We keep only the rows for which the ISO3 code has been specified (some countries, e.g., Channel Islands in the childlessness data, are not contained in the ISO code data). We then add the data the user wants to see to the geographical world data. Finally, we plot the world map. The most important part of this plot is that contained in the geom_polygon_interactive() function from the ggiraph package. This function draws the world map in white with grey lines, fills it up according to the value of the data selected (either childlessness or gender gap rank) in a red-to-blue color scheme set using the brewer.pal() function from the RColorBrewer package, and interactively shows in the tooltip the ISO3 code and value when hovering over the plot.

worldMaps <- function(df, world_data, data_type, period, indicator){
  
  # Function for setting the aesthetics of the plot
  my_theme <- function () { 
    theme_bw() + theme(axis.text = element_text(size = 14),
                       axis.title = element_text(size = 14),
                       strip.text = element_text(size = 14),
                       panel.grid.major = element_blank(), 
                       panel.grid.minor = element_blank(),
                       panel.background = element_blank(), 
                       legend.position = "bottom",
                       panel.border = element_blank(), 
                       strip.background = element_rect(fill = 'white', colour = 'white'))
  }
  
  # Select only the data that the user has selected to view
  plotdf <- df[df$Indicator == indicator & df$DataType == data_type & df$Period == period,]
  plotdf <- plotdf[!is.na(plotdf$ISO3), ]
  
  # Add the data the user wants to see to the geographical world data
  world_data['DataType'] <- rep(data_type, nrow(world_data))
  world_data['Period'] <- rep(period, nrow(world_data))
  world_data['Indicator'] <- rep(indicator, nrow(world_data))
  world_data['Value'] <- plotdf$Value[match(world_data$ISO3, plotdf$ISO3)]
  
  # Create caption with the data source to show underneath the map
  capt <- paste0("Source: ", ifelse(data_type == "Childlessness", "United Nations" , "World Bank"))
  
  # Specify the plot for the world map
  library(RColorBrewer)
  library(ggiraph)
  g <- ggplot() + 
    geom_polygon_interactive(data = world_data, color = 'gray70', size = 0.1,
                                    aes(x = long, y = lat, fill = Value, group = group, 
                                        tooltip = sprintf("%s<br/>%s", ISO3, Value))) + 
    scale_fill_gradientn(colours = brewer.pal(5, "RdBu"), na.value = 'white') + 
    scale_y_continuous(limits = c(-60, 90), breaks = c()) + 
    scale_x_continuous(breaks = c()) + 
    labs(fill = data_type, color = data_type, title = NULL, x = NULL, y = NULL, caption = capt) + 
    my_theme()
  
  return(g)
}

Building an R Shiny app

Now that we have our data and world mapping function ready and specified, we can start building our R Shiny app. (If you’re not familiar with R Shiny, I recommend that you to have a look at the Getting Started guide first.) We can build our app by specifying the UI and server components. In the UI, we include a fixed user input selection where the user can choose whether they want to see the childlessness or gender gap index data. We further include dynamic inputs for the period and indicators the user wants to see. As mentioned before, these are dynamic because the choices shown will depend on the selections made by the user on previous inputs. We then use the ggiraph package to output our interactive world map. We use the sidebarLayout() function to show the input selections on the left side and the world map on the right side, rather than the two stacked vertically.

Everything that depends on the inputs by the user needs to be specified in the server function, which in this case is not only the world map creation, but also the second and third input choices, since these depend on the previous inputs made by the user. For example, when we run the app later, we’ll see that when the user selects the childlessness data for the first input for data type, the third indicator input will only show age groups, and the text above the selector will also show “age group”, whereas when the user selects the gender gap index data, the third indicator will show different measures and the text above the selector will show “indicator” rather than “age group”.

Finally, we can run our app by either clicking “Run App” in the top of our RStudio IDE, or by running

shinyApp(ui = ui, server = server)

Below is a screen shot of the app. You can check out the live app here. In this post on my GitHub, you can also see how to embed a Shiny app in your R Markdown files, which is a really cool and innovative way of preparing interactive documents. Finally, the source code used to build the live app can also be found on my GitHub here.

Now try selecting different inputs and see how the input choices change when doing so. Also, don’t forget to try hovering over the world map to see different data values for different countries interactively!

To leave a comment for the author, please follow the link and comment on their blog: R Views.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.