Site icon R-bloggers

The Cycling Accident Map of Madrid City

[This article was first published on R – Fronkonstin, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Far away, this ship has taken me far away (Starlight, Muse)

Madrid City has an Open Data platform where can be found around 300 data sets about a number of topics. One of these sets is the one I used for this experiment. It contains information about cycling accidents  happened in the city from January to July 2017. I have done a map to locate where the accidents took place. This experiment shows how R makes very easy to create professional maps with Leaflet (in this case I use Carto basemaps).

To locate accidents the data set only contains the address where they happened so the first thing I did is to obtain their geographical coordinates using geocode function from ggmap package. There were 431 accidents during the first 7 months of 2017 (such a big number!) and I got coordinates of 407 so I can locate 94% of the accidents.

Obviously, the amount of accidents in some place depend on how many bikers circulate there as well as on its infrastructure. None of these things can be seen in the map: It only shows number of accidents.

The categorization of accidents is:

These categories are redundant (e.g. Double and Multiple collision), difficult to understand (e.g. Overturn) or both things at the same time (e.g. Motorcycle fall and Moped fall). This categorization also forgets human damages incurred by the accident.

Taking all these things in mind, this is the map:

< !-- iframe plugin v.4.3 wordpress.org/plugins/iframe/ -->

Here is a full-screen version of the map.

My suggestions to the city council of Madrid are:

  1. Add geographical coordinates to data (I guess many of the analysis will need them)
  2. Rethink the categorization to make it clearer and more informative
  3. Add more cycling data sets to the platform (detail of bikeways, traffic …) to understand accidents better
  4. Attending just to the number of accidents , put the focus around Parque del Retiro, specially on its west surroundings, from Plaza de Cibeles to Plaza de Carlos V: more warning signals, more  (or better) bikeways …

I add the code below to update the map (If someone ask it to me, I can do it myself regularly):

library(dplyr)
library(stringr)
library(ggmap)
library(leaflet)
# First, getting the data
download.file(paste0("http://datos.madrid.es/egob/catalogo/", file), 
              destfile="300110-0-accidentes-bicicleta.csv")

data=read.csv("300110-0-accidentes-bicicleta.csv", sep=";", skip=1)

# Prepare data for geolocation
data %>% 
  mutate(direccion=paste(str_trim(Lugar), str_trim(Numero), "MADRID, SPAIN", sep=", ") %>% 
           str_replace("NA, ", "") %>% 
           str_replace(" - ", " CON ")) -> data

# Geolocation (takes some time ...)
coords=c()
for (i in 1:nrow(data)) 
{
  coords %>% rbind(geocode(data[i,"direccion"])) -> coords
  Sys.sleep(0.5)
}
  
# Save data, just in case
data %>% cbind(coords) %>% saveRDS(file="bicicletas.RDS")

data=readRDS(file="bicicletas.RDS")

# Remove non-successfull geolocations
data %>% 
  filter(!is.na(lon)) %>% 
  droplevels()-> data

# Remove non-successfull geolocations
data %>% mutate(Fecha=paste0(as.Date(data$Fecha, "%d/%m/%Y"), " ", TRAMO.HORARIO),
                popup=paste0("<b>Dónde:</b>",
                             direccion,
                             "<b>Cuándo:</b>",
                             Fecha,
                             "<b>Qué pasó:</b>",
                             Tipo.Accidente)) -> data

# Do the map
data %>% split(data$Tipo.Accidente) -> data.df

l <- leaflet() %>% addProviderTiles(providers$CartoDB.Positron)

names(data.df) %>%
  purrr::walk( function(df) {
    l <<- l %>%
      addCircleMarkers(data=data.df[[df]],
                 lng=~lon, lat=~lat,
                 popup=~popup,
                 color="red",
                 stroke=FALSE,
                 fillOpacity = 0.8,
                 group = df,
                 clusterOptions = markerClusterOptions(removeOutsideVisibleBounds = F))
  })

l %>%
  addLayersControl(
    overlayGroups = names(data.df),
    options = layersControlOptions(collapsed = FALSE)
  )

To leave a comment for the author, please follow the link and comment on their blog: R – Fronkonstin.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.