Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Note, at the time of this writing using the packages ggplot2
and ggmap
from CRAN will result in an error. To avoid the error be sure to install both packages from GitHub with the package devtools
and restart R if the problem persists.
devtools::install_github("dkahle/ggmap") devtools::install_github("hadley/ggplot2")
Introduction
In Part 1 of this series we collected geodata from Google of metro stops and plotted them on maps. In Part 2 we’ll be building Delaunay triangulations on top of those maps and computing the centroid of the network. This post will include some pretty advanced use of tidyverse
packages. For more information on some of these calls look at the tidyverse
documentation.
Data
As a reminder, our data is the hand corrected values of the data we pulled down from Google. To see how we got the data go back to Part 1: Data.
Maps with Delaunay Triangulations and Centroids
With our maps and data points in place let’s compute the Delaunay triangulation for each city. This will let us find the area a given city’s metro covers, and allows us to compute a center point, or centroid, for the metro system. We do this with the deldir
package. First though, I am going to use a function from tidyr
called nest()
which allows me to collapse a bunch of data into a single cell. By nesting by city
I get one row for each city and then the rest of the data for each column is a list of values in one cell. Additionally, I can collapse all of my other columns into a single column using .key
, in this case this new column is called location_info
. Think of it as a data frame tucked within a cell of a data frame. With my data nested I can make a new column called deldir
that will have all of the information from my deldir()
call. The deldir()
call simply takes two lists of continuous data points. It then computes several things, including the area of the shape and the edges of all the segments connecting the points. How do we access this information though? We can pull this information out with a purrr
call, map()
. The map()
call takes in some data and a function and applies the data to the function in an iterative fashion. For our purposes though we’re saying we want to take the data in the form of the column deldir
and pull out the del.area
. Thanks to the mutate()
call we can then save it to a new column. We can do the same thing with delsgs
(the segments of the shape) and summary
(more information about the individual triangles). See the fully nested data frame below.
library(purrr) library(deldir) data_deldir = data %>% nest(-city, .key = location_info) %>% mutate(deldir = map(location_info, function(df) deldir(df$lon, df$lat))) %>% mutate(del.area = map(deldir, "del.area")) %>% mutate(delsgs = map(deldir, "delsgs")) %>% mutate(summary = map(deldir, "summary")) data_deldir # A tibble: 4 × 6 city location_info deldir del.area delsgs summary <fctr> <list> <list> <list> <list> <list> 1 Paris <tibble [298 × 9]> <S3: deldir> <dbl [1]> <data.frame [849 × 6]> <data.frame [287 × 9]> 2 Berlin <tibble [173 × 9]> <S3: deldir> <dbl [1]> <data.frame [499 × 6]> <data.frame [171 × 9]> 3 Barcelona <tibble [149 × 9]> <S3: deldir> <dbl [1]> <data.frame [433 × 6]> <data.frame [148 × 9]> 4 Prague <tibble [58 × 9]> <S3: deldir> <dbl [1]> <data.frame [161 × 6]> <data.frame [58 × 9]>
Based on these areas it looks like the Berlin metro covers the most area at 0.059279 while Barcelona covers the smallest area at 0.016332. Now that we have our nested data frame with all pertinent information, we’re going to unnest the data necessary for our new plots. First we need the delsgs
data, which we use to draw the lines connecting the metro stops. To do this we’ll make a new data frame, dropping all columns except for city
and delsgs
. Then we unnest()
the data frame. This will expand the delsgs
column that had nested values, giving us many more rows and many more columns. The x1, y1, x2, and y1 values will be used later in our plot to draw the edges of our triangles. See part of the unnested data frame below.
data_deldir_delsgs = data_deldir %>% select(city, delsgs) %>% unnest() head(data_deldir_delsgs) # A tibble: 6 × 7 city x1 y1 x2 y2 ind1 ind2 <fctr> <dbl> <dbl> <dbl> <dbl> <int> <int> 1 Paris 2.366928 48.78793 2.359279 48.79272 283 282 2 Paris 2.433489 48.77262 2.366928 48.78793 72 283 3 Paris 2.450590 48.78984 2.433489 48.77262 74 72 4 Paris 2.450590 48.78984 2.459319 48.77978 74 73 5 Paris 2.455281 48.76805 2.433489 48.77262 198 72 6 Paris 2.455281 48.76805 2.459319 48.77978 198 73
In addition to the edges of the shape, we also want the centroid. To do this we’ll first make a new data frame focusing on just the city and summary information. We then unnest()
the data frame just as we did for the edges, however we don’t stop here. What we’re really interested in is the centroid, which we need to compute ourselves. To do this we’ll first group_by()
city. Then we’re going to summarise()
the data. To compute the x-value for the centroid, cent_x
, we’re going to take the x
column, which contains the x-coordinates of all of the points, and multiply each point by the del.wts
column, which contains the weights of the areas of each triangle. By adding these numbers together we get the x-value of the centroid of the entire figure. We can do the same thing for the y-value. See the table below for the data summarised to give us the centroids for each city.
data_deldir_cent = data_deldir %>% select(city, summary) %>% unnest() %>% group_by(city) %>% summarise(cent_x = sum(x * del.wts), cent_y = sum(y * del.wts)) %>% ungroup() data_deldir_cent # A tibble: 4 × 3 city cent_x cent_y <fctr> <dbl> <dbl> 1 Barcelona 2.137923 41.38708 2 Berlin 13.402654 52.51054 3 Paris 2.353365 48.85813 4 Prague 14.447439 50.07588
Now we can update our figures with the triangulations and centroids. I’ve again made a function to build the four maps. As before we start with ggmap()
and our city specific map object. Next we’ll use geom_segment()
to draw our edges. To do this we’ll use x1
, y1
, x2
, and y2
from our data_deldir_delsgs
data frame we made earlier. We then plot the actual metro stop points just as we did in our original map with geom_point()
. Finally we end with one more geom_point()
call, this time on our data_deldir_cent
data frame to plot the centroid specific to each city. See the four updated maps below. Again, I’ve left the code visible for the Paris map to see how the function works and hidden the rest.
del_plot = function(city_name, city_map){ ggmap(city_map, extent = "device") + geom_segment(data = subset(data_deldir_delsgs, city == city_name), aes(x = x1, y = y1, xend = x2, yend = y2), size = 1, color= "#92c5de") + geom_point(data = subset(data, city == city_name), aes(x = lon, y = lat), color = "#0571b0", size = 3) + geom_point(data = subset(data_deldir_cent, city == city_name), aes(x = cent_x, y = cent_y), size = 6, color= "#ca0020") } paris_del.plot = del_plot("Paris", paris_map) paris_del.plot
Conclusion
In Part 2 of this series we computed Delaunay triangulations and centroids for each of our our city’s metro systems. This included some more complicated tidyverse
calls such as nesting and unnesting our data. In the third and final part of this series we’ll look at how the systems change over time and show it with a .gif.
Related Post
- Metro Systems Over Time: Part 1
- Outlier App: An Interactive Visualization of Outlier Algorithms
- Creating an animation using R
- The importance of Data Visualization
- ggplot2 themes examples
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.