Site icon R-bloggers

Open Data R Meetup: exploring the Distribution of Traffic Accidents in Belgrade, 2015 in R

[This article was first published on The Exactness of Mind, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The R code that accompanies this post is found on GitHub: you will find R, Rmd, and HTML files there that were used during the first Open Data R Meetup held in Belgrade, 31 January 2017, organized by Data Science Serbia in Startit Center, Savska 5, Belgrade Serbia. The Open Data initiative in Serbia is still young, our Open Data Portal is still under development, and guess what – we from Data Science Serbia will join the Working Group for Open Data of the Directorate for eGovernment to help open, standardize, structure, publish, and analyse the many forthcoming open data sets from our country – in R, of course 🙂 

The data set under exploration here encompasses data on traffic accidents in Belgrade for 2015 (December 2015 data are missing). The notebook focuses on an exploratory analysis of this test open data set that was provided at the Open Data Portal of the Republic of Serbia (the portal is currently under development). The data set was kindly provided to the Open Data Portal by the Republic of Serbia Ministry of Interior. Many more open data sets will be indexed and uploaded in the forthcoming weeks and months.

The Distribution of Traffic Accidents 2015, Belgrade. Part of the city core is shown on the map produced by ggmap, ggplot2 w. geom_density2d() and stat_density2d().

Besides focusing on the exploration and visualization of this test data set, we have demonstrated the basic usage of {weatherData} to fetch historical weather data to R, {wbstats} to access the rich World Data Bank time series, and {ISOcodes} packages in R.

Some exploratory modeling (Negative Binomial Regression with glm.nb() and Ordinal Logistic Regression with clm()from {ordinal}) is exercised merely to assess the prima facie effects of the most influential factors.

Predicted vs. Observed number of traffic accidents frequency per day, Belgrade 2015. Negative binomial regression for overdispersed frequency data with glm.nb().

Hopefully, this is just a begining of our exploratory analyses of open data in R; in the following months, Data Science Serbia will work hard to enable cross-country open data comparisons by elaborating on the forthcoming Serbian open data sets, and promote R as the lingua franca of the discipline. 

To leave a comment for the author, please follow the link and comment on their blog: The Exactness of Mind.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.