Site icon R-bloggers

John Snow, and OpenStreetMap

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

While I was working for a training on data visualization, I wanted to get a nice visual for John Snow’s cholera dataset. This dataset can actually be found in a great package of famous historical datasets.

library(HistData)
data(Snow.deaths)
data(Snow.streets)

One can easily visualize the deaths, on a simplified map, with the streets (here simple grey segments, see Vincent Arel-Bundock’s post)

plot(Snow.deaths[,c("x","y")], col="red", pch=19, cex=.7,xlab="", ylab="", xlim=c(3,20), ylim=c(3,20))
slist <- split(Snow.streets[,c("x","y")],as.factor(Snow.streets[,"street"]))
invisible(lapply(slist, lines, col="grey"))

Of course, one might add isodensity curves (estimated using kernels)

require(KernSmooth)
kde2d <- bkde2D(Snow.deaths[,2:3], bandwidth=c(0.5,0.5))
contour(x=kde2d$x1, y=kde2d$x2,z=kde2d$fhat, add=TRUE)

Now, what if we want to visualize that dataset on a nice background, from Google Maps, or OpenStreetMaps? The problem here is that locations are in a weird coordinate representation system. So let us use a different dataset. For instance, on Robin Wilson’s blog, one can get datasets in a more traditional representation (here the epsg 27700). We can extract the dataset from

library(foreign)
deaths=read.dbf(".../Cholera_Deaths.dbf")

Then, we need our background,

library(OpenStreetMap)
map = openmap(c(lat= 51.516,   lon= -.141),
              c(lat= 51.511,   lon= -.133))
map=openproj(map, projection = "+init=epsg:27700") 
plot(map)
points(deaths@coords,col="red", pch=19, cex=.7 )

If we zoom in (the code above will be just fine), we get

And then, we can compute the density

X=deaths@coords
kde2d <- bkde2D(X, bandwidth=c(bw.ucv(X[,1]),bw.ucv(X[,2])))

based on the same function as before (here I use marginal cross-validation techniques to get optimal bandwidths). To get a nice gradient, we can use

clrs=colorRampPalette(c(rgb(0,0,1,0), rgb(0,0,1,1)), alpha = TRUE)(20)

and finally, we add it on the map

image(x=kde2d$x1, y=kde2d$x2,z=kde2d$fhat, add=TRUE,col=clrs)
contour(x=kde2d$x1, y=kde2d$x2,z=kde2d$fhat, add=TRUE)

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.