Mapping medical cannabis dispensaries: from PDF table to Google Map with R
[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
by Sheri Gilley, Microsoft Senior Software Engineer
In 2014, Illinois passed into law the creation of a medical cannabis pilot program. As my son has cancer and marijuana could greatly help with some of his symptoms, we eagerly applied for a card when registration was available early in 2015. The first dispensaries were not available until November 2015. At that time there were 9 dispensaries; the PDF file with a table of dispensary names and locations provided by the Illinois Department of Health was an adequate way to find a dispensary.
In the time that dispensaries have been available, my son has been in various hospitals and facilities in and around the city of Chicago. First we were in Park Ridge, then Hyde Park, then Hinsdale, then downtown Chicago and now finally back home in Oak Park. As we moved around the city, I would use that same PDF file to locate the dispensary closest to me. The list has grown from 9 names and addresses to 40 today. With 40 entries, the PDF table format is not at all useful for showing where the dispensaries are located. The entries are listed in the order of the license issue date, making it all the more difficult to see which dispensaries might be easiest for me to visit.
So one weekend I decided to create a map of all the current locations. Keeping in mind that more dispensaries will be available in the future, I wanted to create code that would read the official list of registered dispensaries, so that updates would be easy as more entries were added.
I knew I could read the text of the file in R using pdftools, and could put the locations onto a google map using googleVis. The hardest part of the code was trying to filter out the noise included in the text and reliably get the name, address, and phone number of each dispensary into a data frame. A few handy gsub statements worked their magic and I was left with data ready for mapping.
I added in some geocoding to get the longitude and latitude, thanks to this tip.
Finally, after the data manipulation, the code to produce the map itself is rather straightforward:
You can create this map yourself with this R code.
# create id and LatLong for googleVis map all$id <- paste(all$name, all$address, all$phone, sep='; ') all$LatLong = paste(result$lat, result$long, sep=":")# Now plot with googleVis require(googleVis) g1 <- gvisMap(all, "LatLong" , "id", options=list(showTip=TRUE, showLine=TRUE, enableScrollWheel=TRUE, mapType='normal', width=400, height=400 )) # this opens a browser with the plot plot(g1) # write the code that will be used on my website cat(g1$html$chart, file="dispensariesIL.html") I thought the map might be useful for others in Illinois also interested in finding a dispensary, so I created a small website for the map, and include a few other tips and tricks I have learned along the way by being the procurer of medicinal cannabis in my household. Here is the resulting map, generated on August 12, 2016 (click to see the interactive version):

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
library(pdftools) | |
# read the data from the pdf | |
url="http://www.idfpr.com/Forms/MC/ListofLicensedDispensaries.pdf" | |
text<-pdf_text(url) | |
t<-text | |
# strip out info about start date, end date, etc... it is not consistently formatted | |
# 13 8/24/2015 8/24/2016 280.000\r\n | |
t<-gsub("\\d+\\s+\\d+/\\d+/\\d{4}\\s+\\d+/\\d+/\\d{4}\\s+\\d+.\\d+\\s+",'',t) | |
# turn \r\n into separators | |
t<-gsub("\r\n",";",t, fixed=TRUE) | |
# turn 2 or more spaces into separators | |
t<-gsub("\\s{2,}",';',t) | |
# get rid of mutiple consecutive separators | |
t<-gsub(";{2,}",';',t) | |
# also remove empty records | |
t<- gsub(";\\s+;",'',t) | |
# finally clean up some messiness in the data | |
t<-gsub(';Dispensary;',';',t) | |
t<-gsub("Salveo Health & Wellness", "Salveo Health & Wellness Dispensary",t) | |
t<-gsub(";Illinois;",';',t) | |
t<-gsub("Healthway Services of West","Healthway Services of West Illinois",t) | |
t<-gsub(";Centers;",';',t) | |
t<-gsub("Trinity Compassionate Care","Trinity Compassionate Care Centers",t) | |
t<-gsub("Maribis of Chicago Chicago","Maribis of Chicago; Chicago",t) | |
# some records need ";" in between zip and phone number as of 8/22/16 | |
t<-gsub("(\\d{5})\\s+\\(", "\\1;(", t) | |
t<-unlist(t) | |
# Remove the text at the top of the pdf (updated on 8/22/16) | |
top<-";Illinois Department of Financial and Professional Regulation;Division of Professional Regulation;BRUCE RAUNER;DANIEL KELBER;Governor;Acting Director;The Illinois Department of Financial and Professional Regulation, Division of;Professional Regulation has licensed the following medical cannabis dispensaries under the Illinois Compassion;Cannabis Pilot program Act, 410 ILCS 130/1 et seq., and the regulations adopted pursuant ther;PATIENTS: You must select a dispensary with the Illinois Department of Public Health. BEFORE YOUR FIRST VIS;and ask when it is open for business.;IDFPR - LICENSED MEDICAL CANNABIS DISPENSARIES;Medical;License;License;Cred;Name;Address & Phone Number;Cannabis;Expiration;Issue Date;Nu;District;Date" | |
t<-gsub(top,'',t) | |
#now read the cleaned text as a data.frame | |
d<-read.delim(textConnection(t),header=FALSE,sep=";",blank.lines.skip=T, stringsAsFactors=F) | |
# each set of 4 vars starting with V2 is a new record. | |
all<-data.frame(addr=character(), name=character(), citistatezip = character(), phone=character()) | |
nloops<-floor(ncol(d)/4) | |
for (i in c(1:nloops)){ | |
start <- 2 + (i-1)*4 | |
stop <- start + 3 | |
df <- d[, start:stop ] | |
names(df) <- c("addr","name","citystatezip","phone") | |
all <- rbind(all, df) | |
} | |
# get rid of all blank rows | |
all[all==""]<-NA | |
all<-all[complete.cases(all),] | |
# create address for goecoding | |
all$address<-paste(all$addr,all$citystatezip,sep=', ') | |
# now get lat, lon values from the address | |
# thanks to http://stackoverflow.com/questions/22887833/r-how-to-geocode-a-simple-address-using-data-science-toolbox | |
require(RDSTK) | |
geo.dsk <- function(addr){ # single address geocode with data sciences toolkit | |
require(httr) | |
require(rjson) | |
url <- "http://www.datasciencetoolkit.org/maps/api/geocode/json" | |
response <- GET(url,query=list(sensor="FALSE",address=addr)) | |
json <- fromJSON(content(response,type="text")) | |
loc <- json['results'][[1]][[1]]$geometry$location | |
return(c(address=addr,long=loc$lng, lat= loc$lat)) | |
} | |
result <- do.call(rbind,lapply(all$address,geo.dsk)) | |
result <- data.frame(result) | |
# now plot it on a map | |
# create id and LatLong for googleVis map | |
all$id <- paste(all$name, all$address, all$phone, sep='; ') | |
all$LatLong = paste(result$lat, result$long, sep=":") | |
# Now plot with googleVis | |
require(googleVis) | |
g1 <- gvisMap(all, "LatLong" , "id", | |
options=list(showTip=TRUE, | |
showLine=TRUE, | |
enableScrollWheel=TRUE, | |
mapType='normal', | |
width=400, height=400 | |
)) | |
# this opens a browser with the plot | |
plot(g1) | |
# write the code that will be used on my website | |
cat(g1$html$chart, file="dispensariesIL.html") | |
To leave a comment for the author, please follow the link and comment on their blog: Revolutions.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.