Site icon R-bloggers

Stop and Frisk: Spatial Analysis of Racial Discrepancies

[This article was first published on Stable Markets » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Stops in 2014. Red lines indicate high white stop density areas and blue shades indicate high black stop density areas.
Notice that high white stop density areas are very different from high black stop density areas.
The star in Brooklyn marks the location of officers Liu’s and Ramos’ deaths. The star on Staten Island marks the location of Eric Garner’s death.

In my last post, I compiled and cleaned publicly available data on over 4.5 million stops over the past 11 years.

I also presented preliminary summary statistics showing that blacks had been consistently stopped 3-6 times more than whites over the last decade in NYC.

Since the last post, I managed to clean and reformat the coordinates marking the location of the stops. While I compiled data from 2003-2014, coordinates were available for year 2004 and years 2007-2014. All the code can be found in my GitHub repository.

My goals were to:

Killing two birds with one stone, I made density plots to identify areas with high and low stop densities. Snapshots were taken in 2 year intervals from 2007-2013. Stops of whites are indicated in red contour lines and stops of blacks are indicated in blue shades.

There are two things to note:

Here is the map of stops in 2014, the last year for which I have data:

Stops in 2014. Red lines indicate high white stop density areas and blue shades indicate high black stop density areas.
Notice that high white stop density areas are very different from high black stop density areas.
The star in Brooklyn marks the location of officers Liu’s and Ramos’ deaths. The star on Staten Island marks the location of Eric Garner’s death.

In 2014, we see more concentrated stops of blacks along the coast of Staten island. In fact, Eric Garner died in precisely one of these high-density areas. The location of his death is marked with a star.

Similarly, Officers Liu and Ramos also died in a high black stop density area (location marked with the star in Brooklyn).

Importance. It’s easy to see the importance of such spatial analyses. They add several layers of information on top of the basic summary statistics I presented in my previous post. As I’ve shown above, very terrible and unfortunate events can happen in high-density areas.

Simultaneity. Let’s say we overlay this stop and frisk data with perfectly measured crime data (the potential mismeasurement of “crime” is discussed below) and find that high black density areas actually have low crime density. We cannot necessarily conclude that the NYPD is engaging in a racist expansion of stops in black areas, despite low crime rates. What if crime rates are low because of the high amounts of stops? With the current data, it’s hard to say which way the causality would be run.

Unobserved Factors. Simultaneity aside, we also have unobserved factors to contend with. Are the spatial discrepancies visualized above due to racist police segmenting geographically to efficiently target blacks? Or are the spatial discrepancies simply due to the fact that blacks and whites, in general, live and/or hang out in very different places? Without additional data, it’s hard to say.

Difficulty Establishing Simple Claims. Even the relatively simple claim of “blacks commit crimes at higher than average rates” is difficult to establish. When most people speak of “crime rates”, they are actually referring to arrest rates. We usually don’t observe crimes because criminals aren’t generally upfront people who self-report their crimes. So, we use police arrest data as a proxy for crime. However, if we think that police are inherently racist, then the arrest data they record would also be biased upward. Arrest rates could be much higher than crime rates. My point is that even establishing simple claims requires great care (both in how we phrase the claims and how we attempt to answer them) and is often difficult.

Racism. As I said above, issues such as simultaneity and unobserved factors make it very difficult to establish even simple relationships or claims. It is even harder to establish the inherent racism of an entire group of people, or the inherent criminality of an entire group of people. Much more information is needed.

I hope that making this data available and clean for public use will help researchers address some of these difficulties. Again, all of my code and datasets are available on GitHub. My hope is that other people will combine this data with their own data to reach more impactful conclusions. As always, please cite when sharing.


To leave a comment for the author, please follow the link and comment on their blog: Stable Markets » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.