Site icon R-bloggers

Montreal FSA Scraping Part Dieux

[This article was first published on r - Brandon Bertelsen, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Although we were able to scrape from the web the FSA we wanted, it was unfortunately not a complete list. Instead, let’s try another route using some data that’s been crowdsourced, namely the geocoder.ca dataset or a subset provided by aggdata (as the geocoder.ca table is 50mbs and I don’t need that level of accuracy).

Let’s install some packages first. You may need to install some system files for this to work:

sudo apt-get install libgeos-dev libgdal1-dev libproj-dev  

Now we can install the appropriate packages in R, if they aren’t already:

install.packages("maptools","rgeos","rgdal")  

Now we can run a short script to find the FSA’s within the boundaries of our economic region.

library(ggplot2)  
library(maptools)  
library(rgeos)  
library(rgdal)

# Canadian shapefiles
# select your own (https://goo.gl/ztd9HY) or 
# economic regions (http://goo.gl/YiHMhY) direct download
shp <- file.path("path/to/ger_000b11a_e.shp")  
map <- readShapePoly(shp, proj4string = CRS("+init=epsg:25832"))  
sel <- map$ERNAME == "Montérégie"

# https://www.aggdata.com/download_sample.php?file=ca_postal_codes.csv
fsa_db <- read.csv("https://goo.gl/q97K3L", fileEncoding = "Windows-1252") setNames(fsa_db, c("fsa","place","province","lat","long"))

region <- map[sel,]  
points <- data.frame(long=as.numeric(fsa_db$long),  
                     lat =as.numeric(fsa_db$lat),
                     id  =fsa_db$fsa, stringsAsFactors=F)

# We know that Monteregie is in JXX FSAs
points$yes <- substr(points$id,0,1) == "J"  
points <- points[points$yes,]

# Identify if FSA Long/Lat is within Economic Region

listing <- list()  
for(i in 1:nrow(points)) {  
  p1 <- points[i,1:2]
  sp2   <- SpatialPoints(p1,proj4string=CRS(proj4string(region)))
  listing[[i]] <- gContains(region,sp2)
}

points <- points[listing %>% unlist,]

ggplot(region, aes(x=long,y=lat,group=group))+  
  geom_polygon(fill="lightgreen")+
  geom_path(colour="grey50") +
  geom_point(data=points,aes(x=long,y=lat,group=NULL, color=id), size=1) +
  coord_fixed() + theme(legend.position = "none")

To leave a comment for the author, please follow the link and comment on their blog: r - Brandon Bertelsen.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.