NYC Motor Vehicle Collisions – Street-Level Heat Map
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In this post I will extend a previous analysis creating a borough-level heat map of NYC motor vehicle collisions. The data is from NYC Open Data. In particular, I will go from borough-level to street-level collisions. The processing of the code is very similar to the previous analysis, with a few more functions that map streets to colors. Below, I load the ggmap package, and the data, and only keep collisions with longitude and latitude information.
library(ggmap) d=read.csv('.../NYPD_Motor_Vehicle_Collisions.csv') d_clean=d[which(regexpr(',',d$LOCATION)!=-1),] #### 1. Clean Data #### # get long and lat coordinates from concatenated "location" var comm=regexpr(',',d_clean$LOCATION) d_clean$loc=as.character(d_clean$LOCATION) d_clean$lat=as.numeric(substr(d_clean$loc,2,comm-1)) d_clean$long=as.numeric(substr(d_clean$loc,comm+1,nchar(d_clean$loc)-1)) # create year variable d_clean$year=substr(d_clean$DATE,7,10)
I use the three functions below to process my data. The boro() function subsets to collisions with street names in a specified borough, since some collisions with coordinate data do not have street name data. The function then subsets to collisions in 2013. The accident_freq() functions calculates the frequency of collisions per street, then merges these numbers back to the collision-level data. This is important since the map needs collision-level data, for reasons that will be clear soon. The assign_col() function takes a collision-level data set (created with the accident_freq() function) for a particular borough and assigns each street a color ranging from white to a specified color (e.g. green, red, etc.). Streets with more collisions will be darker.
# functions boro() subsets to 2013 accidents in specified borough boro=function(x){ d_clean2=d_clean[which(d_clean$ON.STREET.NAME!='' & d_clean$BOROUGH==x),] d_2013_2=d_clean2[which(d_clean2$year=='2013'),c('long','lat','ON.STREET.NAME')] return(d_2013_2) } # accident_freq() gets frequency of accidents per street for specified borough accident_freq=function(x){ tab=data.frame(table(x$ON.STREET.NAME)) d_merge=merge(x=x,y=tab,by.x=c('ON.STREET.NAME'),by.y=c('Var1')) d_merge$freqPerc=round((d_merge$Freq/length(x$ON.STREET.NAME))*1000,digits=0) d_merge$freqPerc=ifelse(d_merge$freqPerc==0,1,d_merge$freqPerc) return(d_merge) } # assign_col() assigns color shade to each street based on frequency assign_col=function(x,c){ pal=colorRampPalette(c('white',c)) colors=pal(max(x$freqPerc)) return(colors) } man=boro('MANHATTAN') bronx=boro('BRONX') brook=boro('BROOKLYN') si=boro('STATEN ISLAND') q=boro('QUEENS') man_freq=accident_freq(man) bronx_freq=accident_freq(bronx) brook_freq=accident_freq(brook) si_freq=accident_freq(si) q_freq=accident_freq(q) man_col=assign_col(man_freq,'dodgerblue') bronx_col=assign_col(bronx_freq,'darkred') brook_col=assign_col(brook_freq,'violet') si_col=assign_col(si_freq,'darkgreen') q_col=assign_col(q_freq,'darkgoldenrod4')
Finally, I use ggmap’s get_map() function to get a toner style map of NYC and add geom_path layers. There is one geom_path() layer per borough. Geom_path() connects all longitude and latitude points that are on the same street with a line or “path.” Essentially, it uses street as a grouping factor for the coordinates. All coordinates in a group are connected. Each line is then given a color determined by assign_col() using the col= parameter.
ny_plot=ggmap(get_map('New York, New York',zoom=11,maptype='toner')) plot3=ny_plot+ geom_path(data=man,size=1,aes(x=man$long, y=man$lat,group=man$ON.STREET.NAME),col=man_col[man_freq$freqPerc])+ geom_path(data=bronx,size=1,aes(x=bronx$long, y=bronx$lat,group=bronx$ON.STREET.NAME),col=bronx_col[bronx_freq$freqPerc])+ geom_path(data=brook,size=1,aes(x=brook$long, y=brook$lat,group=brook$ON.STREET.NAME),col=brook_col[brook_freq$freqPerc])+ geom_path(data=si,size=1,aes(x=si$long, y=si$lat,group=si$ON.STREET.NAME),col=si_col[si_freq$freqPerc])+ geom_path(data=q,size=1,aes(x=q$long, y=q$lat,group=q$ON.STREET.NAME),col=q_col[q_freq$freqPerc])+ ggtitle('Street-Level NYC Vehicle Accidents by Borough')+ xlab(" ")+ylab(" ") plot3
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.