Site icon R-bloggers

Mapping Hotspots with R: The GAM

[This article was first published on IDV User Experience, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve been getting a lot of questions about the method used to map the hotspots in the seasonal drunk-driving risk maps.  It uses the GAM (Geographical Analysis Machine), a way of detecting spatial clusters from two data inputs: the data of interest, and a control, or “underlying population at risk” (or at least your best substitute for that).

These four distinct hotspot maps were made in R (using a shorter radial distance than previously posted).  They indicate areas where instances of drunk driving fatalities are much higher than normal in winter, spring, summer, or autumn.

Four individual GAM hotspot maps made in R with a baseline mesh of 10,000 points each with a radius of 14 miles and 49 miles.

GAM
The Geographical Analysis Machine was whipped up by Stan Openshaw and his team in the late 1980s as a way of calculating relative geographic clusters or hotspots.  It requires a point dataset of interest, which are known events, and a background point dataset representing candidates for those events (some examples at the bottom of the post).

The mesh backdrop
The study area is canvased with a mesh of backdrop points.  A fine mesh will result in a higher resolution output, with cluster zones of greater precision.  It also takes longer to process.  These are the seeds from which your hotspot kernels may or may not grow (depending on what you consider significant).

Here’s my study area in R with a mesh backdrop of 10,000 points.  The finer the mesh, the greater the resulting resolution will be, also the greater the amount of coverage overlap depending on what you chose as a meaningful radial distance.

Radial distance
From each point of the mesh backdrop a radial distance is swiped out.  The ratio of events to candidates is counted up, and if the ratio is significantly (how significant is up to you) beyond what a Poisson distribution would expect, then that radius area is retained, nuked if not.  These significant radii are merged together for a discrete vector output of hotspots or they can be used to feed a kernel heatmapping which will result in a bitmap illustration for varying magnitude at distance (like in the above maps).


Events are mapped along with ‘candidates’ in this illustration.


Overlay a mesh to serve as the starting points of your radii.
In real life you’d want a finer mesh than this, given the data density.


Swipe out a radial distance from around the mesh points.

Radii containing a significantly high event-to-candidate ratio are retained.
Wash, rinse, and repeat, with varying radius distances and you’ve got a bubbly indicator of clusters.  Additionally, you can use the clusters as inputs to a kernel density map for a smooth heatmap version.

Why
The previous hotspot mapping post went into greater detail on why it’s important to isolate event intensity from it’s underlying phenomena.  But it is such a cool and useful tool that I can’t help providing examples again…




We are really interested in what folks are up to in R and are doing our best to provide inroads to that work so it can be accessed by more folks in your organization.  Let us know if you have any ideas!

To leave a comment for the author, please follow the link and comment on their blog: IDV User Experience.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.