Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In my last post, I compiled and cleaned publicly available data on over 4.5 million stops over the past 11 years.
I also presented preliminary summary statistics showing that blacks had been consistently stopped 3-6 times more than whites over the last decade in NYC.
Since the last post, I managed to clean and reformat the coordinates marking the location of the stops. While I compiled data from 2003-2014, coordinates were available for year 2004 and years 2007-2014. All the code can be found in my GitHub repository.
My goals were to:
- See if blacks and whites were being stopped at the same locations
- Identify areas with especially high amounts of stops and see how these areas changed over time.
Killing two birds with one stone, I made density plots to identify areas with high and low stop densities. Snapshots were taken in 2 year intervals from 2007-2013. Stops of whites are indicated in red contour lines and stops of blacks are indicated in blue shades.
There are two things to note:
- The snapshots indicate that, in those years, blacks and whites were stopped at very different locations. Whites were being stopped predominantly in Staten Island, Brooklyn, and Manhattan. There is very little overlap with high black stop density areas.
- Blacks were stopped predominantly around the Brooklyn/Queens border and Manhattan/Bronx border.
- These spatial discrepancies are consistent over the time given.
- The high density areas are getting larger over time as the total number of stops decline (indicated by the range of the map legends).
Here is the map of stops in 2014, the last year for which I have data:
In 2014, we see more concentrated stops of blacks along the coast of Staten island. In fact, Eric Garner died in precisely one of these high-density areas. The location of his death is marked with a star.
Similarly, Officers Liu and Ramos also died in a high black stop density area (location marked with the star in Brooklyn).
Importance. It’s easy to see the importance of such spatial analyses. They add several layers of information on top of the basic summary statistics I presented in my previous post. As I’ve shown above, very terrible and unfortunate events can happen in high-density areas.
Simultaneity. Let’s say we overlay this stop and frisk data with perfectly measured crime data (the potential mismeasurement of “crime” is discussed below) and find that high black density areas actually have low crime density. We cannot necessarily conclude that the NYPD is engaging in a racist expansion of stops in black areas, despite low crime rates. What if crime rates are low because of the high amounts of stops? With the current data, it’s hard to say which way the causality would be run.
Unobserved Factors. Simultaneity aside, we also have unobserved factors to contend with. Are the spatial discrepancies visualized above due to racist police segmenting geographically to efficiently target blacks? Or are the spatial discrepancies simply due to the fact that blacks and whites, in general, live and/or hang out in very different places? Without additional data, it’s hard to say.
Difficulty Establishing Simple Claims. Even the relatively simple claim of “blacks commit crimes at higher than average rates” is difficult to establish. When most people speak of “crime rates”, they are actually referring to arrest rates. We usually don’t observe crimes because criminals aren’t generally upfront people who self-report their crimes. So, we use police arrest data as a proxy for crime. However, if we think that police are inherently racist, then the arrest data they record would also be biased upward. Arrest rates could be much higher than crime rates. My point is that even establishing simple claims requires great care (both in how we phrase the claims and how we attempt to answer them) and is often difficult.
Racism. As I said above, issues such as simultaneity and unobserved factors make it very difficult to establish even simple relationships or claims. It is even harder to establish the inherent racism of an entire group of people, or the inherent criminality of an entire group of people. Much more information is needed.
I hope that making this data available and clean for public use will help researchers address some of these difficulties. Again, all of my code and datasets are available on GitHub. My hope is that other people will combine this data with their own data to reach more impactful conclusions. As always, please cite when sharing.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.