Skip to main content

New York Traffic Collisions 2015-2017


Basic Stats - Machine Learning - Geoplotting - d3.js Visualization

Motivation


The following analysis is a final project created by two master students during the spring semester 2017 in the course (02806) Social data analysis and visualization, run by Sune Lehmann and the Technical University of Denmark.
The main topics of the course, besides python programming, are machine learning and data visualization (D3.js). The project has hence been thought out to leverage the techniques we've learned in those topics.

While traffic collisions have a significant impact on our lives the current analysis focuses on exploring the Data collected by NYPD and published by NYC OpenData during 2015-2017. Furthermore data from N.Y. yellow taxi used to estimate the traffic areas and correlate them to the accident sites. Our purpose is to find out why accidents happen and help to prevent them.

Basic Stats


Accident Factors

Cabin

Traffic during the day

Cabin

Accidents during the day

Cabin

Injured

Cabin

Killed

Cabin

Scatter Plot


The following Scatter Plot shows the 10 most frequent factors of car collisions. Click on each circle to get info for the borough with the most accidents of the specific factor.

Heat Maps


Motorists Injured

Cabin

Cyclists Injured

Cabin

Pedestians Injured

Cabin

Motorists Killed

Cabin

Cyclists Killed

Cabin

Pedestrians Killed

Cabin

Kmeans Clustering


Performing Kmeans clustering to all the collision incidents allowed us to define the centroids for 2, 3 and 4 clusters as shown below with red circles. The same actions performed on the yellow taxis pickup data of January ,shown below with blue colour, in order to find out if there is any correlation of the traffic and accidents areas. Hover over the buttons to preview or click to explore the results.

K=2 clusters K=3 clusters K=4 clusters

Discussion


We are thrilled that we met our expectations for the project. We managed to get an insight of the N.Y. traffic collision and provide to the user the ability to see the basic stats, get an overview of the most frequent factors of accidents, the borough that they occure and how the accidents spread during the day. Taking in consideration the pick up - drop off taxi data we made a D3.js geo plot presenting the centroids of the clustering both for accidents and traffic.
Another very interesting machine learning technique was implemented. The decision tree classification, that can predict where an accident occured given the factor, time and the type of the vehicle, can be found on the Jupyter notebook.
For a more detailed analysis as well as our own denouements please check out our Jupyter Notebook on NBviewer bellow.