[This article was first published on everyday analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
I don’t know about you, but I really hate getting parking tickets. Sometimes I feel like it’s all just a giant cash grab. Really? I can’t park there between the hours of 11 and 3, but every other time is okay? Well, why the hell not?But ah, such is life. Rules must be in place to keep civil order, keep the engines of city life running and prevent total chaos in the downtown core. However knowing this does not make coming out to the street to find that bright yellow slip of paper under your windshield wiper any easier.
Like everything else in the universe, parking tickets are a source of data. The great people at Open Data Toronto (@Open_TO) have provided all the data from every parking ticket issued in Toronto from 2008 to the end of last year.
So, let us dive in and have a look. We might just discover why we keeping getting all these tickets, or at least ease the collective pain a little in realizing how many others are sharing in it.
Background
The data set is an anonymized record of every parking ticket issued in the city of Toronto from the period 01/01/2008 – 12/31/2011. The fields provided are: the anonymized ticket #, date of infraction, infraction code, description, fine amount, time of infraction, and location (address).The data set and more information can be found in Open Data Toronto’s data catalogue here.
Originally I had this brilliant idea to geocode every data point, and then create an awesome heat map of the geographical distribution of parking tickets issued. However, given the fact that there are ~11 million records and the Google Maps API has a daily limit of 2,500 geocoding requests per day, even if I was completely diligent and performed the task daily it would still take approximately 4400 days or about 12 years to complete. And no, I am not paying to use the API for Business (which at a limit of 100,000 requests per day would still take ~3.5 months).
If anyone knows a way around this, please drop me an email and fill me in.
Otherwise, you can check out prior art. Patrick Cain at Global News created an awesome interactive map of aggregated parking ticket data from 2010 for locations in the city where over 500 tickets were issued. This turns out to be mainly hospitals, and unsurprisingly, tickets are clustered in the downtown core. Mr. Cain did a similar analysis while at the Toronto Star back in 2009, using data from the previous year.
I just don’t like throwing out data points.
Analysis
Parking Infractions by TypeNext we consider the parking tickets for the period by infraction type. A simple bar chart outlines the most common parking ticket types:
We will consider those codes which stick out most on the bar chart (the top 10):
> sort(codeTable, decreasing=TRUE)[1:11]
005 029 210 003 207 009 002 008 006 015
2336433 1822690 1366945 1354671 933478 718692 496283 443706 369079 173078
005 029 210 003 207 009 002 008 006 015
2336433 1822690 1366945 1354671 933478 718692 496283 443706 369079 173078
Putting that into more human-readable format, the most commonly issued types of parking infractions were:
1. 005 – Park on Highway at Prohibited Time of Day
2. 029 – Park Prohibited Place/Time – No Permit
3. 210 – Park Fail to Display Receipt
4. 003 – Park on Private Property w/o Consent
5. 207 – Park w/o ticket from machine
6. 009 – Stop on Highway at Prohibited Time/Day
7. 002 – Park Longer than 3 Hours
8. 008 – Vehicle Standing Prohibited Time/Day
9. 006 – Park on Highway – Excess of Permitted Time
10. 015 – Park within 3M of Fire Hydrant
In case you were wondering, the most expensive tickets (in the range of 100’s of dollars, the max being $450 [!!] ) are all related to handicapped parking spaces.
Time Distribution of Parking Infractions
Let us now consider the parking ticket information with regards to time. First and foremost, we consider the ticket data as a simple tim
e series and plot the data for the exploratory purposes:
Cool. |
Quickly whipping up a box plot up for the data, we can see that a significantly less proportion of the tickets are issued on Sunday. Also for some reason plotting there are many outliers on the low end. I suspect these are in the aforementioned dips around the holiday season though I did not investigate this.
Conclusions
Performing a quick analysis of many different aspects of the data was not as easy as I had hoped, given the size of the set. Still, it is interesting to see the most common types of violations and the distribution of the majority of the parking tickets with respect to time.Interesting general points of note:
- The most common parking infractions are wrong place / wrong time, followed by various types of failing to display a permit / buy a ticket
- Significantly reduced number of parking violations during the Christmas holiday season
- More tickets issued during the work week
For Part II, I plan to create some heat maps / 2D histograms of the ticket data with respect to time, and I may yet create a geospatial representation of the data, albeit in aggregated form.
To leave a comment for the author, please follow the link and comment on their blog: everyday analytics.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.