Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This guest post is by Rodolfo Vanzini. Rodolfo is senior partner at eXponential.it — an asset management consultancy based in Italy — and advises clients on investment management issues. He taught at the University of Siena and is an analytics professional.
With an economist education and a financial markets expertise four years ago I thought I couldn't be of any help, if not emotional, in supporting my wife in her decision to partner with an internationally renowned franchise and open a school of English in the city where we live, Bologna, Italy. Two years later she realized the premises she had rented weren't large enough to accomodate in the nearest future the foreseeable demand of courses and decided to plan to move. The first question that popped up was: how far do our customers have to travel to reach the school of English? How much does proximity play a role in the decision to move?
Having adverstised city-wide our first assumption was that customers were spread over the city more or less uniformly, therefore ideally location within the city wasn't a significant issue. Nevertheless, I have learned what behavioral economics has taught us over the last thirty years: individuals make biased decisions basing their assumptions and conclusions on a limited and approximate set of rules often leading to sub-optimal outcomes. Thus, not to make a mistake we definetely had to pin-point our customers on a map to assess how integral was retaining proximity within the neighboring area to local customers. My analytical background came into play, at last. With a bit of R I decided to perform some basic analytics to check out how near or far customers had to travel.
With no previous experience I had to look up something that could help me with the obvious mapping issues I was to come across. Pretty soon I found what I was looking for in the ggmap package.
I first created a character variable with our school address and then imported addresses from our local database into a data frame (note that I had previously geocoded the addresses using latlon <- geocode(as.character(addr$Address), output='latlon')
).
bologna <- "Via Dagnini, 42, Bologna, Emilia Romagna, Italy" cust <- read.table(file = "addr_cust.csv", header = TRUE, dec = ",", sep = ";") cust.2012 <- subset(cust, Year == 2012) head(cust) ## ID ## 1 1 Name Last Name ## 2 2 Name Last Name ## 3 3 Name Last Name ## 4 4 Name Last Name ## 5 5 Name Last Name ## 6 6 Name Last Name ## Full.address ## 1 Via Brizzi, 10, 40068 San Lazzaro di Savena, Emilia Romagna Italy ## 2 Via Ruggi, 14, 40137 Bologna, Emilia Romagna Italy ## 3 Via Stradelli Guelfi, 78/3, 40138 Bologna, Emilia Romagna Italy ## 4 Via Degli Scalini 9, 40136 Bologna, Emilia Romagna Italy ## 5 Via Cavazza, 8, 40137 Bologna, Emilia Romagna Italy ## 6 Via Delle Fragole, 26, 40137 Bologna, Emilia Romagna Italy ## latitude longitude Year ## 1 44.46 11.38 2012 ## 2 44.48 11.37 2012 ## 3 44.48 11.42 2012 ## 4 44.48 11.35 2012 ## 5 44.48 11.37 2012 ## 6 44.47 11.37 2012
The nicest part of ggmap, I found out, is that you can choose among different kinds of map sources providing different types of geo/graphic details.
require(ggmap) ## Loading required package: ggmap ## Loading required package: ggplot2 qmap(bologna, zoom = 12, source = "google", maptype = "roadmap") ## Map from URL : ## http://maps.googleapis.com/maps/api/staticmap?center=Via+Dagnini,+42,+Bologna,+Emilia+Romagna,+Italy&zoom=12&size=%20640x640&scale=%202&maptype=roadmap&sensor=false ## Google Maps API Terms of Service : http://developers.google.com/maps/terms ## Information from URL : ## http://maps.googleapis.com/maps/api/geocode/json?address=Via+Dagnini,+42,+Bologna,+Emilia+Romagna,+Italy&sensor=false ## Google Maps API Terms of Service : http://developers.google.com/maps/terms
After a few attempts fiddling with different map sources (google, osm and stamen) I resolved the latter would offer the optimal graphics needed. Then I plotted latitude and longitude coordinates to show our customer addresses on the map.
bologna.map <- get_map(bologna, zoom = 12, source = "stamen", maptype = "toner") ## Map from URL : ## http://maps.googleapis.com/maps/api/staticmap?center=Via+Dagnini,+42,+Bologna,+Emilia+Romagna,+Italy&zoom=12&size=%20640x640&maptype=terrain&sensor=false ## Google Maps API Terms of Service : http://developers.google.com/maps/terms ## Information from URL : ## http://maps.googleapis.com/maps/api/geocode/json?address=Via+Dagnini,+42,+Bologna,+Emilia+Romagna,+Italy&sensor=false ## Google Maps API Terms of Service : http://developers.google.com/maps/terms Bologna.Map.2012 <- ggmap(bologna.map, base_layer = ggplot(aes(x = longitude, y = latitude), data = cust.2012), extent = "device") Bologna.Map.2012 + geom_point(size = I(3), alpha = 1/3)
Though the first map wasn't as informative as expected, with surprise we started to notice how clustered in the south-eastern part of the city customers appeared. What I was looking for was a way to show the density of customers in the surrounding area and found an alternative graphics using faceting to check if there had been a developing pattern over the years since start-up and found that, partly unexpectedly, customers had started clustering in that area since the beginning.
After checking with the available data provided locally by city hall and the chamber of commerce for GPD per capita and population density the answer to our initial question was coming into shape. There had been a natural selection of customers: customer density decreases as we move out from the center of the cloud. Needless to say that we wanted to retain our customer relationships and it was becoming apparent that the new location had to be found as near as possible to the middle of that cloud.
Bologna.Map <- ggmap(bologna.map, base_layer = ggplot(aes(x = longitude, y = latitude), data = cust), extent = "device") Bologna.Map + stat_density2d(aes(x = longitude, y = latitude, fill = ..level.., alpha = ..level..), size = 2, bins = 4, data = cust, geom = "polygon", show_guide = FALSE) + facet_wrap(~Year)
The following map, with a focus on 2012 to recap what we'd found out about our (then) current location and our customers, has demonstrated how central that south-eastern area has been and still is to our local business.
Bologna.Map.2012 + stat_density2d(aes(x = longitude, y = latitude, fill = ..level.., alpha = ..level..), size = 2, bins = 4, data = cust.2012, geom = "polygon", show_guide = FALSE)
This little exercise has demonstrated how relevant proximity is, probably not only to our local business considering the well researched and studied central place theory in urban economics, for instance. These findings confirm to a certain extent some general theories of small business management and have enabled us not to make a mistake driven by some behavioral bias. If I think I've always considered analytics and R
as tools of applied economics in search of unexpressed arbitrage opportunities by Wall Street firms, then I might conclude it's more useful to find established patterns of relationships on Main Street.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.