Visualizing Bus Stops with rCharts

[This article was first published on Stats and things, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I wanted to create a quick visualization of Bloomington IL bus stops. This data is in pdf file format spread across multiple files. The first step, before any mapping can occur, is downloading those files, parsing them to get the bus stop locations and times.

First, I need to get a list of all of the files. This was a little complicated by the fact that the URL for the buses didn’t play nice with some of the usual html R tools (RCurl). Alas, the httr package was the solution. First get the html dump, then look for a table with the id of fsvItemsTable and make your way down the tree to get the hrefs for all of the files. I imagine there’s a way to avoid the grep at the end of this snippet, but it works, so I stopped…
library(XML)
library(httr)
url1 <- "http://www.district87.org/pages/Bloomington_School_District_87/Parents_and_Students/Bus_Routes/Bloomington_High_School"
x <- GET(url1)
text <- content(x, as="text")
doc <- htmlParse(text)
a <- xpathSApply(doc, "//table[@id='fsvItemsTable']/tr")
links <- xpathSApply(a[[1]], "//a/@href")
links <- links[grepl("^/files", links)]
view raw getList.R hosted with ❤ by GitHub


Next, use this list of links to download the files. Again, the normal way of doing things, download.file(), failed, but downloader::download() did work.
library(downloader)
for(link in links){
url1 <- paste0("http://www.district87.org", link)
download(url1, file.path("data", basename(url1)))
}
view raw getFiles.R hosted with ❤ by GitHub


Now that we have a directory filled with pdf files, what do we do with it? Well, there’s a function called readPDF() in the tm package that can be used to read the data in a pdf file. And using code ripped straight from stack overflow, it was pretty easy to get the data.
library(tm)
pdf <- readPDF(PdftotextOptions = "-layout")
dat <- pdf(elem = list(uri=paste0("data/", file)), language='en', id='id1')
dat <- gsub(' +', ',', dat)
out <- read.csv(textConnection(dat), header=FALSE)
out <- apply(out, 1, function(x)paste(x, collapse=" "))
view raw readPDF.R hosted with ❤ by GitHub


This leaves you with a single string for each row of data in the pdf table. A little grepping will separate the data in to separate columns in a data frame. See the full code linked at the bottom of the page for these details.

Now we must geocode the bus stop locations so we can plot them on a map. For this, the ggmap package has a simple function called geocode().

At the end of all of this, we finally have a data set to map. Here’s what it looks like…
> head(df)
route od pu.time do.time
3 BHS_11.pdf 6:54 AM RAINBOW AVE @ RIDGEPORT AVE 2:33 PM 6:54 AM 2:33 PM
4 BHS_11.pdf 7:02 AM RIDGEPORT AVE @ CLEARWATER AVE 2:36 PM 7:02 AM 2:36 PM
8 BHS_13.pdf 6:48 AM ON MERCER AT LINCOLN 2:44 PM 6:48 AM 2:44 PM
10 BHS_13.pdf 6:49 AM BENJAMIN LN @ SNYDER DR 2:46 PM 6:49 AM 2:46 PM
11 BHS_13.pdf 6:52 AM ON ARCADIA AT FAIRMONT 2:33 PM 6:52 AM 2:33 PM
13 BHS_13.pdf 6:55 AM BROADMOOR DR @ PHEASANT RUN 2:35 PM 6:55 AM 2:35 PM
stop lat lon
3 RAINBOW AVE and RIDGEPORT AVE 40.49903 -88.94384
4 RIDGEPORT AVE and CLEARWATER AVE 40.49615 -88.94381
8 ON MERCER AT LINCOLN 40.48420 -88.99369
10 BENJAMIN LN and SNYDER DR 40.46744 -88.96148
11 ON ARCADIA AT FAIRMONT 40.48420 -88.99369
13 BROADMOOR DR and PHEASANT RUN 40.46809 -88.94833
view raw output.R hosted with ❤ by GitHub


…well, not exactly. To use the toGeoJSON() function in the rCharts package, the df must be transformed into a list. Also, I add in a color for each route so we can tell them apart on the map, and format the text for the tooltip for each point.
library(RColorBrewer)
# max of 12 cats in colorbrewer, gotta add 6 more
colors <- brewer.pal(12, "Paired")
colors <- c(colors, "#050505", "#FAF20F", "#FA28EC", "#24E3CD", "#DAFAD4", "#6B6C6E")
df2 <- df
routes <- unique(df2$route)
df2$color <- colors[match(df2$route, routes)]
df2$popup <- paste0("<p>Pick up time: ", df$pu.time,
"<br>Drop off time: ", df$do.time,
"<br>", df$stop,
"<br>Route Number: ", str_extract(df2$route, "[0-9]+"), "</p>")
tmp.data <- apply(df2, 1, as.list)
view raw transform.R hosted with ❤ by GitHub


Again, in keeping with using other people’s code, I reused some code that Ramnath Vaidyanathan had done for the foodborne chicago map a while back to create a leaflet map of the bus stops. He is the author of the rCharts package, is super helpful via twitter and github with random issues, and is doing a tutorial at useR_2014 in LA. I can’t wait to meet him… The last part of this code snippet creates a github gist out of it. I had some trouble using it on my network, so I just used the .save() method to create an html file and copy-pasted it as a gist.
library(rCharts)
bus.map <- Leaflet$new()
bus.map$setView(c(40.4739, -88.9719), zoom = 13)
bus.map$tileLayer(provider = 'Stamen.TonerLite')
# Add Data as GeoJSON Layer and Specify Popup and FillColor
bus.map$geoJson(toGeoJSON(tmp.data, lat = 'lat', lon = 'lon'),
onEachFeature = '#! function(feature, layer){
layer.bindPopup(feature.properties.popup)
} !#',
pointToLayer = "#! function(feature, latlng){
return L.circleMarker(latlng, {
radius: 5,
fillColor: feature.properties.color || 'red',
color: '#000',
weight: 1,
fillOpacity: 0.8
})
} !#"
)
bus.map$set(width = 1600, height = 800)
bus.map$enablePopover(TRUE)
#bus.map$publish('Bloomington IL Bus Stops', host = 'gist')
bus.map$save("index.html", cdn=T)
view raw bus_stop_map.R hosted with ❤ by GitHub


And here’s the result. There’s still some work to be done on the geocoding end of things. As you can see if you click on a dot on the map, the location doesn’t always line up with where the map tooltip says it should be.

All of the code can be found on github.

To leave a comment for the author, please follow the link and comment on their blog: Stats and things.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)