Experimenting With R – Point to Point Mapping With Great Circles
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve started doodling again… This time, around maps, looking for recipes that make life easier plotting lines to connect points on maps. The most attractive maps seem to use great circles to connect one point with another, these providing the shortest path between two points when you consider the Earth as a sphere.
Here’s one quick experiment (based on the Flowing Data blog post How to map connections with great circles), for an R/Shiny app that allows you to upload a CSV file containing a couple of location columns (at least) and an optional “amount” column, and it’ll then draw lines between the points on each row.
The app requires us to solve several problems, including:
- how to geocode the locations
- how to plot the lines as great circles
- how to upload the CSV file
- how to select the from and two columns from the CSV file
- how to optionally select a valid numerical column for setting line thickness
Let’s start with the geocoder. For convenience, I’m going to use the Google geocoder via the geocode() function from the ggmap library.
#Locations are in two columns, *fr* and *to* in the *dummy* dataframe #If locations are duplicated in from/to columns, dedupe so we don't geocode same location more than once locs=data.frame(place=unique(c(as.vector(dummy[[fr]]),as.vector(dummy[[to]]))),stringsAsFactors=F) #Run the geocoder against each location, then transpose and bind the results into a dataframe cbind(locs, t(sapply(locs$place,geocode, USE.NAMES=F)))
The locs data is a vector of locations:
place 1 London, UK 2 Cambridge,UK 3 Paris,France 4 Sydney, Australia 5 Paris, France 6 New York,US 7 Cape Town, South Africa
The sapply(locs$place,geocode, USE.NAMES=F) function returns data that looks like:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] lon -0.1254872 0.121817 2.352222 151.207 2.352222 -74.00597 18.42406 lat 51.50852 52.20534 48.85661 -33.86749 48.85661 40.71435 -33.92487
The transpose (t() gives us:
lon lat [1,] -0.1254872 51.50852 [2,] 0.121817 52.20534 [3,] 2.352222 48.85661 [4,] 151.207 -33.86749 [5,] 2.352222 48.85661 [6,] -74.00597 40.71435 [7,] 18.42406 -33.92487
The cbind() binds each location with its lat and lon value:
place lon lat 1 London, UK -0.1254872 51.50852 2 Cambridge,UK 0.121817 52.20534 3 Paris,France 2.352222 48.85661 4 Sydney, Australia 151.207 -33.86749 5 Paris, France 2.352222 48.85661 6 New York,US -74.00597 40.71435 7 Cape Town, South Africa 18.42406 -33.92487
Code that provides a minimal example for uploading the data from a CSV file on the desktop to the Shiny app, then creating dynamic drop lists containing column names, can be found here: Simple file geocoder (R/shiny app).
The following snippet may be generally useful for getting a list of column names from a data frame that correspond to numerical columns:
#Get a list of column names for numerical columns in data frame df nums <- sapply(df, is.numeric) names(nums[nums])
The code for the full application can be found as a runnable gist in RStudio from here: R/Shiny app – great circle mapping. [In RStudio, install.packages("shiny"); library(shiny); runGist(9690079). The gist contains a dummy data file if you want to download it to try it out...]
Here’s the code explicitly…
The global.R file loads the necessary packages, installing them if they are missing:
#global.R ##This should detect and install missing packages before loading them - hopefully! list.of.packages <- c("shiny", "ggmap","maps","geosphere") new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])] if(length(new.packages)) install.packages(new.packages) lapply(list.of.packages,function(x){library(x,character.only=TRUE)})
The ui.R file builds the Shiny app’s user interface. The drop down column selector lists are populated dynamically with the names of the columns in the data file once it is uploaded. An optional Amount column can be selected – the corresponding list only displays the names of numerical columns. (The lists of location columns to be geocoded should really be limited to non-numerical columns.) The action button prevents the geocoding routines firing until the user is ready – select the columns appropriately before geocoding (error messages are not handled very nicely;-)
#ui.R shinyUI(pageWithSidebar( headerPanel("Great Circle Map demo"), sidebarPanel( #Provide a dialogue to upload a file fileInput('datafile', 'Choose CSV file', accept=c('text/csv', 'text/comma-separated-values,text/plain')), #Define some dynamic UI elements - these will be lists containing file column names uiOutput("fromCol"), uiOutput("toCol"), #Do we want to make use of an amount column to tweak line properties? uiOutput("amountflag"), #If we do, we need more options... conditionalPanel( condition="input.amountflag==true", uiOutput("amountCol") ), conditionalPanel( condition="input.amountflag==true", uiOutput("lineSelector") ), #We don't want the geocoder firing until we're ready... actionButton("getgeo", "Get geodata") ), mainPanel( tableOutput("filetable"), tableOutput("geotable"), plotOutput("geoplot") ) ))
The server.R file contains the server logic for the app. One thing to note is the way we isolate some of the variables in the geocoder reactive function. (Reactive functions fire when one of the external variables they contain changes. To prevent the function firing when a variable it contains changes, we need to isolate it. (See the docs for me; for example, Shiny Lesson 7: Reactive outputs or Isolation: avoiding dependency.)
#server.R shinyServer(function(input, output) { #Handle the file upload filedata <- reactive({ infile <- input$datafile if (is.null(infile)) { # User has not uploaded a file yet return(NULL) } read.csv(infile$datapath) }) #Populate the list boxes in the UI with column names from the uploaded file output$toCol <- renderUI({ df <-filedata() if (is.null(df)) return(NULL) items=names(df) names(items)=items selectInput("to", "To:",items) }) output$fromCol <- renderUI({ df <-filedata() if (is.null(df)) return(NULL) items=names(df) names(items)=items selectInput("from", "From:",items) }) #If we want to make use of an amount column, we need to be able to say so... output$amountflag <- renderUI({ df <-filedata() if (is.null(df)) return(NULL) checkboxInput("amountflag", "Use values?", FALSE) }) output$amountCol <- renderUI({ df <-filedata() if (is.null(df)) return(NULL) #Let's only show numeric columns nums <- sapply(df, is.numeric) items=names(nums[nums]) names(items)=items selectInput("amount", "Amount:",items) }) #Allow different line styles to be selected output$lineSelector <- renderUI({ radioButtons("lineselector", "Line type:", c("Uniform" = "uniform", "Thickness proportional" = "thickprop", "Colour proportional" = "colprop")) }) #Display the data table - handy for debugging; if the file is large, need to limit the data displayed [TO DO] output$filetable <- renderTable({ filedata() }) #The geocoding bit... Isolate variables so we don't keep firing this... geodata <- reactive({ if (input$getgeo == 0) return(NULL) df=filedata() if (is.null(df)) return(NULL) isolate({ dummy=filedata() fr=input$from to=input$to locs=data.frame(place=unique(c(as.vector(dummy[[fr]]),as.vector(dummy[[to]]))),stringsAsFactors=F) cbind(locs, t(sapply(locs$place,geocode, USE.NAMES=F))) }) }) #Weave the goecoded data into the data frame we made from the CSV file geodata2 <- reactive({ if (input$getgeo == 0) return(NULL) df=filedata() if (input$amountflag != 0) { maxval=max(df[input$amount],na.rm=T) minval=min(df[input$amount],na.rm=T) df$b8g43bds=10*df[input$amount]/maxval } gf=geodata() df=merge(df,gf,by.x=input$from,by.y='place') merge(df,gf,by.x=input$to,by.y='place') }) #Preview the geocoded data output$geotable <- renderTable({ if (input$getgeo == 0) return(NULL) geodata2() }) #Plot the data on a map... output$geoplot<- renderPlot({ if (input$getgeo == 0) return(map("world")) #Method pinched from: http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/ map("world") df=geodata2() pal <- colorRampPalette(c("blue", "red")) colors <- pal(100) for (j in 1:nrow(df)){ inter <- gcIntermediate(c(df[j,]$lon.x[[1]], df[j,]$lat.x[[1]]), c(df[j,]$lon.y[[1]], df[j,]$lat.y[[1]]), n=100, addStartEnd=TRUE) #We could possibly do more styling based on user preferences? if (input$amountflag == 0) lines(inter, col="red", lwd=0.8) else { if (input$lineselector == 'colprop') { maxval <- max(df$b8g43bds) minval= min(df$b8g43bds) colindex <- round( (df[j,]$b8g43bds[[1]]/10) * length(colors) ) lines(inter, col=colors[colindex], lwd=0.8) } else if (input$lineselector == 'thickprop') { lines(inter, col="red", lwd=df[j,]$b8g43bds[[1]]) } else lines(inter, col="red", lwd=0.8) } } }) })
So that’s the start of it… this app could be further developed in several ways, for example allowing the user to filter or colour displayed lines according to factor values in a further column (commodity type, for example), or produce a lattice of maps based on facet values in a column.
I also need to figure how to to save maps, and maybe produce zoomable ones. If geocoded points all lay within a blinding box limited to a particular geographical area, scaling the map view to show just that area might be useful.
Other techniques might include using proportional symbols (circles) at line landing points to show the sum of values incoming to that point, or some of values outgoing, or the difference between the two; (maybe use green for incoming outgoing, then size by the absolute difference?)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.