Metro Systems Over Time: Part 3
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Note, at the time of this writing using the packages ggplot2
and ggmap
from CRAN will result in an error. To avoid the error be sure to install both packages from GitHub with the package devtools
and restart R if the problem persists.
devtools::install_github("dkahle/ggmap") devtools::install_github("hadley/ggplot2")
Note, at the time this writing the package gganimate
requires the package cowplot
. Be sure to install and load the package before continuing.
install.packages("cowplot") library(cowplot)
Introduction
In Part 1 and Part 2 of this series we made maps of metro systems across four European cities, and then computed Delaunay triangulations and centroids for each city. In the third and final part, we’ll do these same steps but at multiple time points to make a .gif of how metro systems change over time.
Data
As a reminder, our data is the hand corrected values of the data we pulled down from Google. To see how we got the data go back to Part 1: Data.
Maps with Change Over Time
We now have a good sense of what each city’s current metro system looks like, but how did these systems come to be this way? Now we’ll look at how these systems have changed and grown over time. That’s why at the beginning we made a column for opened_year
. At this point the code gets less elegant but we’ll go through it step by step. It’s all the same principles as when we made our figures earlier.
The main idea of the following code is that we’re going to create unique triangulations for each year within each city. As more metro stations get added each year the triangulation will change. Just as we had data_deldir_delsgs
and data_deldir_cent
, we’re going to start by creating two empty data frames time_deldir_delsgs
and time_deldir_sum
(remember that our centroid data frame was based on the summary data). With our empty data frames initialized we can make a for loop
. We want to go through each year, but for each city separately, so our first for loop
goes through each city, filtering our data to only the city in question. Next we have our second for loop
going through each year starting with the minimum year in the data for that city and up to 2015, the maximum year for the full data set. For a given year we filter()
to include only metro stops that were opened that year or earlier. We do equal to or less than because we don’t want to ignore metro stops from earlier years, we want the whole metro system as it exists for a given year. Note, we need at least three points to make a triangle, and you may think that a city wouldn’t ever have only one or two metro stops but you would be wrong (cough Barcelona cough), so we’re going to put a stop gap saying if the number of data points is less than three the loop should skip that year and move to the next one.
Okay, assuming there are at least three data points though we’re going to run the deldir()
call and then save it in year_deldir
. Then we create two new data frames. the first is year_deldir_delsgs
which contains the delsgs
information from deldir
. We’re going to add two columns too, city
and opened_year
, so we know which city and year this data comes from. We then add this information to our existing time_deldir_delsgs
data frame with a bind_rows()
call. We then do the same thing to create year_deldir_sum
, only we pull out the summary
information from year_deldir
instead of the delsgs
information. We also add our city
and opened_year
columns and then bind_rows()
it with time_deldir_sum
. The loop does this for every city from the minimum year in the data up to 2015. See below the head of the two data frames we created.
time_deldir_delsgs = data.frame() time_deldir_sum = data.frame() for(c in c("Paris", "Berlin", "Barcelona", "Prague")) { data_city = filter(data, city == c) for(year in min(data_city$opened_year):2015) { data_year = filter(data_city, opened_year <= year) # Add condition to skip if number of stops less than 3 if(dim(data_year)[1] < 3) next year_deldir = deldir(data_year$lon, data_year$lat) year_deldir_delsgs = year_deldir$delsgs %>% mutate(city = c) %>% mutate(opened_year = year) time_deldir_delsgs = bind_rows(time_deldir_delsgs, year_deldir_delsgs) year_deldir_sum = year_deldir$summary %>% mutate(city = c) %>% mutate(opened_year = year) time_deldir_sum = bind_rows(time_deldir_sum, year_deldir_sum) } } head(time_deldir_delsgs) head(time_deldir_sum) x1 y1 x2 y2 ind1 ind2 city opened_year 1 2.358279 48.85343 2.369510 48.85302 11 2 Paris 1900 2 2.374377 48.84430 2.369510 48.85302 9 2 Paris 1900 3 2.374377 48.84430 2.358279 48.85343 9 11 Paris 1900 4 2.414484 48.84640 2.369510 48.85302 16 2 Paris 1900 5 2.414484 48.84640 2.374377 48.84430 16 9 Paris 1900 6 2.386581 48.84732 2.369510 48.85302 18 2 Paris 1900 x y n.tri del.area del.wts n.tside nbpt dir.area dir.wts city opened_year 1 2.289364 48.87577 4 1.9e-05 0.012510 4 2 0.000066 0.009329 Paris 1900 2 2.369510 48.85302 5 7.0e-05 0.047518 4 2 0.000520 0.073787 Paris 1900 3 2.290121 48.86685 5 3.7e-05 0.024907 5 0 0.000074 0.010541 Paris 1900 4 2.313310 48.86750 4 3.7e-05 0.024692 3 4 0.000388 0.054946 Paris 1900 5 2.294900 48.87393 5 1.6e-05 0.010551 3 2 0.000061 0.008594 Paris 1900 6 2.347301 48.85869 4 1.1e-05 0.007359 3 4 0.000392 0.055588 Paris 1900
As you may recall though we’re not necessarily interested in all the summary information, we just want it to compute our centroid. So, we make a new data frame time_deldir_cent
. The code is the same as our earlier code for computing centroids, the only difference is that we’ll also group by opened_year
, not just city
, since we want unique centroids for each year for each city. See part of the data frame of the centroids below.
time_deldir_cent = time_deldir_sum %>% group_by(city, opened_year) %>% summarise(cent_x = sum(x * del.wts), cent_y = sum(y * del.wts)) %>% ungroup() head(time_deldir_cent) # A tibble: 6 × 4 city opened_year cent_x cent_y <chr> <int> <dbl> <dbl> 1 Barcelona 1924 2.116839 41.38132 2 Barcelona 1925 2.120019 41.37782 3 Barcelona 1926 2.121921 41.37834 4 Barcelona 1927 2.121921 41.37834 5 Barcelona 1928 2.122325 41.37384 6 Barcelona 1929 2.113628 41.37543
There’s still one more thing I want to do before we make our figures. Right now the figures will have different start dates depending on when the first metro stop was built in a given city. Instead, I want all figures to start at the same year so we see them change over time with the same start date for each city. To do this we’ll make a new data frame called years
that simply lists the years 1900 to 2015 four times, once for each city. We then do a left_join()
with our data. As a result any time the opened_year
in question is not found in the data frame for a given city an empty row will be added, empty except for the opened_year
and city
values. You’ll also notice that I filter()
ed to only include decade years (1900, 1910, 1920, etc.), and the year 2015 so it includes the last year of our data. This is because if we include every year our gif will be very large and non-portable. Also it’s more dramatic to see changes every 10 years.
years = data.frame(opened_year = rep(seq(1900, 2015), 4), city = c(rep("Paris", 116), rep("Berlin", 116), rep("Barcelona", 116), rep("Prague", 116))) data_time = left_join(years, data) %>% mutate(opened_by_year = ifelse(opened_year %% 10 == 0, opened_year, opened_year + (10 - (opened_year %% 10)))) %>% filter(opened_by_year % filter(opened_year %% 10 == 0 | opened_year == 2015) time_deldir_cent_sub = time_deldir_cent %>% filter(opened_year %% 10 == 0 | opened_year == 2015)
I kept saying we were going to make maps showing the change over time, but how are we going to do that? Well instead of building a single static plot for each city we’re going to build an animation where as the year changes so will the map. To do this we’ll use the package gganimate
which works on top of ggplot2
(which is useful since we’re already using ggmap
which works on top of ggplot2
). We build our plot just as we would any other ggplot2
figure, but for data we want to add the frame
setting. The frame
is the thing in the plot that changes, in our case opened_year
. Also, while we only want to plot the triangulations and centroids specific to a given year, we want the points for the metro stops to be additive. For example, when frame
is 2000 we still want the points from 1990 to be plotted. To do this we add cumulative = TRUE
to the call for those points. Finally, since we updated our data to include empty rows so that all plots start on 1900, all plots will have a frame starting at 1900, even if there are no data points to plot. I’ve again made a function to make our plots. See below for the code for the Paris map as well as all four animations. Also, notice that in 1920 (actually 1912) Barcelona gets their first metro stop…but doesn’t get anymore until 1930 (actually 1924). Take a look to see if you can find any other interesting things about how the systems changed over time.
devtools::install_github("dgrtwo/gganimate") library(gganimate) time_plot = function(city_name, city_map){ ggmap(city_map, extent = "device") + geom_segment(data = subset(time_deldir_delsgs_sub, city == city_name), aes(x = x1, y = y1, xend = x2, yend = y2, frame = opened_year), size = 1, color= "#92c5de") + geom_point(data = subset(data_time, city == city_name), aes(x = lon, y = lat, frame = opened_by_year, cumulative = TRUE), color = "#0571b0", size = 3) + geom_point(data = subset(time_deldir_cent_sub, city == city_name), aes(x = cent_x, y = cent_y, frame = opened_year), size = 6, color= "#ca0020") + theme(plot.title = element_text(hjust = 0.5)) } paris_time.plot = time_plot("Paris", paris_map) gganimate(paris_time.plot)
Conclusion
In these three post we looked at how the metro systems of four European cities changed over time. To do this we used a lot of different packages. We used the packages dplyr
, tidyr
, purrr
, and ggplot2
, which are all now a part of the package tidyverse
. We used used two other plotting packages that build upon ggplot2
, ggmap
and gganimate
. Finally we used the deldir
package to make Delaunay triangulations and compute centroids of city metro systems over time. All of these skills can be applied to any other type of spacial data with unique shapes, and can be used to make your very own gifs. Try your city as a practice exercise!
Related Post
- Metro Systems Over Time: Part 2
- Metro Systems Over Time: Part 1
- Outlier App: An Interactive Visualization of Outlier Algorithms
- Creating an animation using R
- The importance of Data Visualization
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.