Montreal FSA Scraping Part Dieux
[This article was first published on r - Brandon Bertelsen, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Although we were able to scrape from the web the FSA we wanted, it was unfortunately not a complete list. Instead, let’s try another route using some data that’s been crowdsourced, namely the geocoder.ca dataset or a subset provided by aggdata (as the geocoder.ca table is 50mbs and I don’t need that level of accuracy).
Let’s install some packages first. You may need to install some system files for this to work:
sudo apt-get install libgeos-dev libgdal1-dev libproj-dev
Now we can install the appropriate packages in R, if they aren’t already:
install.packages("maptools","rgeos","rgdal")
Now we can run a short script to find the FSA’s within the boundaries of our economic region.
library(ggplot2) library(maptools) library(rgeos) library(rgdal) # Canadian shapefiles # select your own (https://goo.gl/ztd9HY) or # economic regions (http://goo.gl/YiHMhY) direct download shp <- file.path("path/to/ger_000b11a_e.shp") map <- readShapePoly(shp, proj4string = CRS("+init=epsg:25832")) sel <- map$ERNAME == "Montérégie" # https://www.aggdata.com/download_sample.php?file=ca_postal_codes.csv fsa_db <- read.csv("https://goo.gl/q97K3L", fileEncoding = "Windows-1252") setNames(fsa_db, c("fsa","place","province","lat","long")) region <- map[sel,] points <- data.frame(long=as.numeric(fsa_db$long), lat =as.numeric(fsa_db$lat), id =fsa_db$fsa, stringsAsFactors=F) # We know that Monteregie is in JXX FSAs points$yes <- substr(points$id,0,1) == "J" points <- points[points$yes,] # Identify if FSA Long/Lat is within Economic Region listing <- list() for(i in 1:nrow(points)) { p1 <- points[i,1:2] sp2 <- SpatialPoints(p1,proj4string=CRS(proj4string(region))) listing[[i]] <- gContains(region,sp2) } points <- points[listing %>% unlist,] ggplot(region, aes(x=long,y=lat,group=group))+ geom_polygon(fill="lightgreen")+ geom_path(colour="grey50") + geom_point(data=points,aes(x=long,y=lat,group=NULL, color=id), size=1) + coord_fixed() + theme(legend.position = "none")
To leave a comment for the author, please follow the link and comment on their blog: r - Brandon Bertelsen.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.