Plotting GTFS data with R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Days ago a study says that Santiago, city where I live, has one of the best public transport system in LATAM (WAT?! define best please!). So I've search for some information and I found this. Anyway I tried to find some related data/gtfs/information to work/play and I found the Transantiago GTFS. GTFS means General Transit Feed Specification and is a format for public transportation schedules and geographic data.
This information comes in a zip file with information about routes, stations
(name, location), shapes (route, path) and other elements in the system. For example
the shape.txt
file have the geographic path of each route.
Let's see the files:
library("dplyr") library("readr") shapes <- read.csv("data/gtfs/shapes.txt") head(shapes)
shape_id | shapeptlat | shapeptlon | shapeptsequence |
---|---|---|---|
225-I-BASE | -33.4 | -70.5 | 0 |
225-I-BASE | -33.4 | -70.5 | 1 |
225-I-BASE | -33.4 | -70.5 | 2 |
225-I-BASE | -33.4 | -70.5 | 3 |
225-I-BASE | -33.4 | -70.5 | 4 |
225-I-BASE | -33.4 | -70.5 | 5 |
It's simple plot this data with ggplot.
library("ggplot2") library("ggthemes") p <- ggplot(shapes) + geom_path(aes(shape_pt_lon, shape_pt_lat, group = shape_id), size = .1, alpha = .1) + coord_equal() + theme_map() p
It is a good plot with a few lines of code. But let's get the things more fun: Transantiago have a subway called Metro, so let's plot with more detail showing the stations and the routes (lines) over this plot.
We need obtain the stops and routes which belong to Metro. In this case, the stop_id
don't contain a number so we filter the metro's stations with !grepl("\d", stop_id)
.
Then we need filter the shapes and routes for the metro. At the beggining is a bit complicated,
in fact I needed some time to see the association between all this tables.
routes <- read_csv("data/gtfs/routes.txt") trips <- read.csv("data/gtfs/trips.txt") stops <- read.csv("data/gtfs/stops.txt") stops_metro <- stops %>% filter(!grepl("\d", stop_id)) routes_metro <- routes %>% filter(grepl("^L\d", route_id)) shapes_metro <- shapes %>% filter(shape_id %in% trips$shape_id[trips$route_id %in% routes_metro$route_id]) %>% arrange(shape_id, shape_pt_sequence)
Now, get the color for each Metro line.
shapes_colors <- left_join(left_join(shapes %>% select(shape_id) %>% unique(), trips %>% select(shape_id, route_id) %>% unique(), by = "shape_id"), routes %>% select(route_id, route_color) %>% unique(), by = "route_id") %>% mutate(route_color = paste0("#", route_color)) shapes_colors_metro <- shapes_colors %>% filter(shape_id %in% trips$shape_id[trips$route_id %in% routes_metro$route_id]) %>% unique() %>% arrange(shape_id)
The data is ready. So it's time to make another plot.
p2 <- ggplot() + geom_path(data = shapes, aes(shape_pt_lon, shape_pt_lat, group = shape_id), color = "white", size = .2, alpha = .05) + geom_path(data = shapes_metro, aes(shape_pt_lon, shape_pt_lat, group = shape_id, colour = shape_id), size = 2, alpha = .7) + scale_color_manual(values = shapes_colors_metro$route_color) + geom_point(data = stops_metro, aes(stop_lon, stop_lat), shape = 21, colour = "white", alpha = .8) + coord_equal() + theme_map() + theme(plot.background = element_rect(fill = "black", colour = "black"), title = element_text(hjust = 1, colour = "white", size = 8), axis.title.x = element_text(hjust = 0, colour = "white", size = 7), legend.position = "none") + xlab(sprintf("Joshua Kunst | Jkunst.com %s", format(Sys.Date(), "%Y"))) + ggtitle("TRANSANTIAGOnSantiago's public transport system") p2
Or we can just plot only te metro routes with the follow code:
p3 <- ggplot() + geom_path(data = shapes_metro, aes(shape_pt_lon, shape_pt_lat, group = shape_id, colour = shape_id), size = 2, alpha = .8) + scale_color_manual(values = shapes_colors_metro$route_color) + geom_point(data = stops_metro, aes(stop_lon, stop_lat), shape = 21, colour = "white", alpha = .8, size = 3) + coord_equal() + theme_map() + theme(plot.background = element_rect(fill = "black", colour = "black"), title = element_text(hjust = 1, colour = "white", size = 8), legend.position = "none") + xlab(sprintf("Joshua Kunst | Jkunst.com %s", format(Sys.Date(), "%Y"))) p3 + ggtitle("Santiago's METRO")
You can see the original image on wikipedia here. As you can see, it's simply make a good graphic with a few lines of code. And better, GTFS is a standard, so you can reuse a big part of this code (and make it a better code!) to plot transport systems from other cities. If you do it, let me know.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.