Mapping Seattle Crime
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Last week I published a data visualization of San Francisco crime.
This week, I’m mapping Seattle crime data.
The map above is moderately complicated to create, so I’ll start this tutorial with a simpler case: the dot distribution map.
Seattle crime map, simplified version
First, we’ll start by loading the data.
Note that I already “cleaned” this dataset (mostly removing extraneous variables, data prior to 2010, etc,).
library(ggmap) library(dplyr) library(ggplot2) ######################### # GET SEATTLE CRIME DATA ######################### download.file("http://www.sharpsightlabs.com/wp-content/uploads/2015/01/seattle_crime_2010_to_2014_REDUCED.txt.zip", destfile="seattle_crime_2010_to_2014_REDUCED.txt.zip") #------------------------------ # Unzip the SF crime data file #------------------------------ unzip("seattle_crime_2010_to_2014_REDUCED.txt.zip") #------------------------------------ # Read crime data into an R dataframe #------------------------------------ df.seattle_crime <- read.csv("seattle_crime_2010_to_2014_REDUCED.txt")
Get map of Seattle using ggmap package
Next, we’ll get a map of Seattle using
################ # SEATTLE GGMAP ################ map.seattle_city <- qmap("seattle", zoom = 11, source="stamen", maptype="toner",darken = c(.3,"#BBBBBB")) map.seattle_city
Here, we’re using
We’re calling it with “seattle” as the first argument. That does exactly what you think it does. It tells
We’re also setting a “zoom” parameter. Again, play with that number and see what happens. Currently, we’re setting zoom to 11. To be clear, you can use zoom to zoom in or zoom out on the specified location. In this case, we’re zooming in on the center of Seattle, and if we zoom in too much, we’ll omit parts of the city. For our purposes, a zoom of 11 is ideal.
The
On top of that, you’ll note that I’m using a parameter called “darken.” Effectively, I’m using
Next, we’ll plot.
Make basic dot distribution map
########################## # CREATE BASIC MAP # - dot distribution map ########################## map.seattle_city + geom_point(data=df.seattle_crime, aes(x=Longitude, y=Latitude))
This map is a little ugly, but it’s instructive to examine what we’re doing in the code.
Notice that the syntax is almost the same as the syntax for the basic scatterplot. In some sense, this is a scatterplot.
As proof, let’s create a scatterplot using the same dataset. Simply replace the
##################### # CREATE SCATTERPLOT ##################### ggplot() + geom_point(data=df.seattle_crime, aes(x=Longitude, y=Latitude))
This is the exact same data and the same variable mapping. We’ve just removed the
That’s part of the reason I wanted to write up this tutorial. I’ve emphasized earlier that you should master the basic charts like the scatterplot. One reason I emphasize the basics is because the basic charts serve as foundations for more complicated charts.
In this case, the scatterplot is the foundation for the dot distribution map.
Ok. Now, let’s go back to our map. You might have noticed that the data is really “dense.” All of the points are on top of each other. We call this “overplotting.” We’re going to modify our point geoms to deal with this overplotting.
Adjust point transparency to deal with overplotting
############################# # ADD TRANSPARENCY and COLOR ############################# map.seattle_city + geom_point(data=df.seattle_crime, aes(x=Longitude, y=Latitude), color="dark green", alpha=.03, size=1.1)
Notice that we made some modifications within
We added color to make it a little more interesting.
But more importantly, we modified two parameters:
The
To be clear, there are other solutions for dealing with overplotting. This isn’t necessarily the best solution, but early in learning data science, this will be one of the simplest to implement.
Wrapping up
The above tutorial shows you how to make a basic dot distribution map using R’s ggplot2 and ggmap.
Note a few things:
- We’re building on foundational techniques. In this case, we’ve made a dot distribution map, which is just a modified scatterplot.
- We built this plot iteratively. We started with the base map, then added points, and then modified those points.
It bears repeating that you should master the basics like the scatterplot, line, histogram, and bar chart. Also practice designing data visualizations iteratively. When you can do these things, you’ll be able to progress to more sophisticated visualization techniques.
Finally, if you want to replicate the map at the beginning of the post, here’s the code:
################################# # TILED version # tile border mapped to density ################################# map.seattle_city + stat_density2d(data=df.seattle_crime, aes(x=Longitude , y=Latitude ,color=..density.. ,size=ifelse(..density..<=1,0,..density..) ,alpha=..density..) ,geom="tile",contour=F) + scale_color_continuous(low="orange", high="red", guide = "none") + scale_size_continuous(range = c(0, 3), guide = "none") + scale_alpha(range = c(0,.5), guide="none") + ggtitle("Seattle Crime") + theme(plot.title = element_text(family="Trebuchet MS", size=36, face="bold", hjust=0, color="#777777"))
If you look carefully, you’ll notice that the code has quite a few similarities to the basic dot distribution map. (Again: master the basics, and you’ll start to understand what’s going on here.)
The post Mapping Seattle Crime appeared first on SHARP SIGHT LABS.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.