Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The first post of this year 2020, I will dedicate to a question that I was recently asked. The question was how to calculate the shortest distance between different points and how to know which is the closest point. When we work with spatial data in R, currently the easiest thing is to use the sf
package in combination with the tidyverse
collection of packages. We also use the units
package which is very useful for working with units of measurement.
Packages
Package | Description |
---|---|
tidyverse | Collection of packages (visualization, manipulation): ggplot2, dplyr, purrr, etc. |
sf | Simple Feature: import, export and manipulate vector data |
units | Support for measurement units in R vectors, matrices and arrays: propagation, conversion, derivation |
maps | Draw geographical maps |
rnaturalearth | Hold and facilitate interaction with Natural Earth map data |
# install the necessary packages if(!require("tidyverse")) install.packages("tidyverse") if(!require("units")) install.packages("units") if(!require("sf")) install.packages("sf") if(!require("maps")) install.packages("maps") if(!require("rnaturalearth")) install.packages("rnaturalearth") # load packages library(maps) library(sf) library(tidyverse) library(units) library(rnaturalearth)
Measurement units
The use of vectors and matrices with the units
class allows us to calculate and transform units of measurement.
# length l <- set_units(1:10, m) l ## Units: [m] ## [1] 1 2 3 4 5 6 7 8 9 10 # convert units set_units(l, cm) ## Units: [cm] ## [1] 100 200 300 400 500 600 700 800 900 1000 # sum different units set_units(l, cm) + l ## Units: [cm] ## [1] 200 400 600 800 1000 1200 1400 1600 1800 2000 # area a <- set_units(355, ha) set_units(a, km2) ## 3.55 [km2] # velocity vel <- set_units(seq(20, 50, 10), km/h) set_units(vel, m/s) ## Units: [m/s] ## [1] 5.555556 8.333333 11.111111 13.888889
Capital cities of the world
We will use the capital cities of the whole world with the objective of calculating the distance to the nearest capital city and indicating the name/country.
# set of world cities with coordinates head(world.cities) # from the maps package ## name country.etc pop lat long capital ## 1 'Abasan al-Jadidah Palestine 5629 31.31 34.34 0 ## 2 'Abasan al-Kabirah Palestine 18999 31.32 34.35 0 ## 3 'Abdul Hakim Pakistan 47788 30.55 72.11 0 ## 4 'Abdullah-as-Salam Kuwait 21817 29.36 47.98 0 ## 5 'Abud Palestine 2456 32.03 35.07 0 ## 6 'Abwein Palestine 3434 32.03 35.20 0
To convert points with longitude and latitude into a spatial object of class sf
, we use the function st_as_sf()
, indicating the coordinate columns and the coordinate reference system (WSG84, epsg: 4326).
# convert the points into an sf object with CRS WSG84 cities <- st_as_sf(world.cities, coords = c("long", "lat"), crs = 4326) cities ## Simple feature collection with 43645 features and 4 fields ## geometry type: POINT ## dimension: XY ## bbox: xmin: -178.8 ymin: -54.79 xmax: 179.81 ymax: 78.93 ## CRS: EPSG:4326 ## First 10 features: ## name country.etc pop capital geometry ## 1 'Abasan al-Jadidah Palestine 5629 0 POINT (34.34 31.31) ## 2 'Abasan al-Kabirah Palestine 18999 0 POINT (34.35 31.32) ## 3 'Abdul Hakim Pakistan 47788 0 POINT (72.11 30.55) ## 4 'Abdullah-as-Salam Kuwait 21817 0 POINT (47.98 29.36) ## 5 'Abud Palestine 2456 0 POINT (35.07 32.03) ## 6 'Abwein Palestine 3434 0 POINT (35.2 32.03) ## 7 'Adadlay Somalia 9198 0 POINT (44.65 9.77) ## 8 'Adale Somalia 5492 0 POINT (46.3 2.75) ## 9 'Afak Iraq 22706 0 POINT (45.26 32.07) ## 10 'Afif Saudi Arabia 41731 0 POINT (42.93 23.92)
In the next step, we filter by the capital cities encoded in the column capital with 1. The advantage of the sf
package is the possibility of applying functions of the tidyverse
collection to manipulate the attributes. In addition, we add a column with new labels using the str_c()
function of the stringr
package, which is similar to that of R Base paste()
.
# filter the capital cities capitals <- filter(cities, capital == 1) # create a new label combining name and country capitals <- mutate(capitals, city_country = str_c(name, " (", country.etc, ")")) capitals ## Simple feature collection with 230 features and 5 fields ## geometry type: POINT ## dimension: XY ## bbox: xmin: -176.13 ymin: -51.7 xmax: 179.2 ymax: 78.21 ## CRS: EPSG:4326 ## First 10 features: ## name country.etc pop capital geometry ## 1 'Amman Jordan 1303197 1 POINT (35.93 31.95) ## 2 Abu Dhabi United Arab Emirates 619316 1 POINT (54.37 24.48) ## 3 Abuja Nigeria 178462 1 POINT (7.17 9.18) ## 4 Accra Ghana 2029143 1 POINT (-0.2 5.56) ## 5 Adamstown Pitcairn 51 1 POINT (-130.1 -25.05) ## 6 Addis Abeba Ethiopia 2823167 1 POINT (38.74 9.03) ## 7 Agana Guam 1041 1 POINT (144.75 13.47) ## 8 Algiers Algeria 2029936 1 POINT (3.04 36.77) ## 9 Alofi Niue 627 1 POINT (-169.92 -19.05) ## 10 Amsterdam Netherlands 744159 1 POINT (4.89 52.37) ## city_country ## 1 'Amman (Jordan) ## 2 Abu Dhabi (United Arab Emirates) ## 3 Abuja (Nigeria) ## 4 Accra (Ghana) ## 5 Adamstown (Pitcairn) ## 6 Addis Abeba (Ethiopia) ## 7 Agana (Guam) ## 8 Algiers (Algeria) ## 9 Alofi (Niue) ## 10 Amsterdam (Netherlands)
Calculate distances
Geographical distance (Euclidean or greater circle) is calculated with the st_distance()
function, either between two points, between one point and others or between all points. In the latter case we obtain a symmetric matrix of distances (NxN), taken pairwise between the elements of the capital city set. In the diagonal we find the combinations between the same points giving all null values.
A | B | C | |
A | 0 | 235 | 478 |
B | 235 | 0 | 287 |
C | 478 | 287 | 0 |
For instance, when we want to know the distance from Amsterdam to Abu Dhabi, Washington and Tokyo we pass two spatial objects.
# calculate distance dist_amsterdam <- st_distance(slice(capitals, 10), slice(capitals, c(2, 220, 205))) dist_amsterdam # distance in meters ## Units: [m] ## [,1] [,2] [,3] ## [1,] 5167859 6203802 9316790
The result is a matrix with a single row or column (depending on the order of the spatial objects) with a class of units
. Thus it is possible to convert easily to another unit of measure. If we want to obtain a vector without class units
, we only have to apply the function as.vector()
.
# change from m to km set_units(dist_amsterdam, "km") ## Units: [km] ## [,1] [,2] [,3] ## [1,] 5167.859 6203.802 9316.79 # units class to vector as.vector(dist_amsterdam) ## [1] 5167859 6203802 9316790
In the second step, we estimate the distance matrix between all the capital cities. It is important to convert the null values to NA
to subsequently obtain the correct matrix index.
# calculate distance m_distance <- st_distance(capitals) # matrix dim(m_distance) ## [1] 230 230 # change m to km m_distance_km <- set_units(m_distance, km) # replace the distance of 0 m with NA m_distance_km[m_distance_km == set_units(0, km)] <- NA
units
class, it is necessary to use the same class to be able to make logical queries. For example, set_units(1, m) == set_units(1, m)
vs. set_units(1, m) == 1
.
To obtain the shortest distance, in addition to its position, we use the apply ()
function which in turn allows us to apply the function which.min()
and min()
on each row. It would also be possible to use the function on columns giving the same result. Finally, we add the results as new columns with the mutate()
function. The indices in pos allow us to obtain the names of the nearest cities.
# get the index (position) of the city and the distance pos <- apply(m_distance_km, 1, which.min) dist <- apply(m_distance_km, 1, min, na.rm = TRUE) # add the distance and get the name of the city capitals <- mutate(capitals, nearest_city = city_country[pos], geometry_nearest = geometry[pos], distance_city = dist)
Map of distances to the next capital city
Finally, we build a map representing the distance in proportional circles. To do this, we use the usual grammar of ggplot()
by adding the geometry geom_sf()
, first for the world map as background and then for the cities. In aes()
we indicate, with the argument size = distance_city
, the variable which we want to map proportionally. The theme_void()
function removes all style elements. In addition, we define with the function coord_sf()
a new projection indicating the proj4 format.
# world map world <- ne_countries(scale = 10, returnclass = "sf") # map ggplot(world) + geom_sf(fill = "black", colour = "white") + geom_sf(data = capitals, aes(size = distance_city), alpha = 0.7, fill = "#bd0026", shape = 21, show.legend = 'point') + coord_sf(crs = "+proj=robin +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs") + labs(size = "Distance (km)", title = "Distance to the next capital city") + theme_void()
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.