Site icon R-bloggers

New York City Motor Vehicle Collision Data Visualization

[This article was first published on R – NYC Data Science Academy Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Everybody loves New York City. Nobody likes car accidents. Why bother look at the motor vehicle collision data? Well, reality is reality. Road safety is by any means a critical issue, and is relevant to everybody’s daily life. It’s inevitable, and more often than not, a life-or-death situation indeed. Therefore, it is very important to look at the past collision history data and see what we can learn from the data to help better prevent and/or avoid collisions in the future. Meanwhile, this is a fairly challenging/interesting data-science problem by itself, which hence becomes the core motivation of this project.

The data set used is from the city government’s OpenData website, where a lot of useful data sets archived by city government are provided, including 311, 911 call history, restaurant inspection, traffic volume, traffic violation, etc. The NYC motor vehicle collision data set, contains up-to-date collision record ever since July 1st, 2012, and each record shows the date, time, location, the number of injured and/or killed people (both totally and in terms of pedestrian, cyclist, and motorist, respectively), along with the causes and involved types of vehicles, etc.

Questions

Looking at such a comprehensive data set, some interesting questions directly jumped into mind are:

Also, some questions of general interest are:

Objectives

To address these questions, the specific project objectives are:

  1. Develop an interactive map tool to easily check and explore collision case info on a real city map,
  2. Conduct some preliminary exploratory data analysis to get overview results on interested questions.

Interactive Map Tool

To help easily visualize and explore the spatial details of the collision data, a comprehensive and flexible interactive map tool is developed using Leaflet package of R.

       

One particular nice feature of the tool is that: it can show heat map (collision data point intensity), cluster map (clustered collision data), and lethal collision markers (with detailed collision information pop-up) all at the same time. Besides, the user has highly flexible control on what portion of data he/she’d like to see, what year, borough, month range, and for what types of victims (pedestrians, cyclists, motorists) and what severity (no hurt, injured, lethal).

Preliminary Exploratory Data Analysis

Some preliminary analysis is done to get some high-level overview picture of the data set.

Time Factors

The figures below show the total number of collisions with respect to different years (and boroughs), month in a year, day in a week, and hour in a day, respectively.

   

   

Some major observations are as follows.

Particularly for the 2016 significant collision number drop, it’s mainly because of the successful Vision Zero campaign launched by the city government. How/what to predict for the situation of 2018 is definitely a fairly challenging/interesting problem deserving further/deep study/investigations…

Severity and Victims

The ratios of different severity levels and types of victims are shown below.

   

   

Some primary observations are:

Especially for Manhattan:

It seems that the high pedestrian victim ratio may be the main contributing factor for the much higher no-hurt ratio. To confirm that, we need further investigate the no-hurt ratio of the pedestrian victims in Manhattan…

Causes and Involved Vehicles

The archived collision causes and involved types of vehicles are highlighted in the following frequency bar graphs. Note that herein to be more informative, we excluded the two most common top causes of “Driver Inattention/Distraction” and “Failure to Yield Right-of-Way“, and the two most common top involved types of vehicles of “Passenger Vehicle” and “Sport Utility / Station Wagon“, from the corresponding causes and vehicle graphs, respectively.

   

   

Some major observations are:

With the above two observations, it looks like there may be existing reasonably high correlation between the two leading factors of drowsiness and commercial vehicle drivers. This would be a good direction for further study…

Takeaways

To facilitate the investigation/exploration of the collision data set, an interactive map tool is developed rendering comprehensive mapped information, along with flexible data control. Some preliminary analysis is done with some reasonable observations and interesting findings/thoughts for further study/investigation. Shiny app is available online. Source code is available at Github.

In summary, based on the top 20 causes and involved types of vehicles results, some good road safety advice are:

  1. Cautiously assume/take the right-of-way,
  2. Cautiously watch out for drowsy/commercial drivers.

As for city government, some reasonable ideas/suggestions coming out of the results are: to think of and define new/effective ways, and/or maybe stronger regulations, to better prevent drowsy driving (which is in effect may not be too much different than drunk driving actually), unsafe backing (some smart/effective way to better avoid this?), improper turning (and this?), etc. Hopefully, the more thinking/effort on these problems would help better prevent these high percentage causes of accidents in the future.

What Next

Based on findings so far, some interesting/promising directions to further pursue the topic include: 

Finally, note that investigating this important data set has been a very hot topic during the past several years. There are quite many good/solid works/analysis/results out there for reference. This study is merely another personal journey at its very beginning…

Thank you!

To leave a comment for the author, please follow the link and comment on their blog: R – NYC Data Science Academy Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.