How a Mexican state ended up with more drug war homicides than total homicides

[This article was first published on Diego Valle's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

During 2007 and 2008 the Mexican state of Sinaloa had more drug war-related homicides than total homicides. This should in theory be impossible since drug war homicides are a subset of total homicides. How did this happen?

Here is a chart from my old post highlighting the monthly difference between the vital statistics data and drug war-related homicides (Dec 2006- Dec 2009):
Just two weeks ago 8 bodies were found in Sinaloa, I can only image how difficult it must be to determine the exact dates the homicides occurred in some cases. So it’s not a big deal if a month ends up having 5 or so extra drug war homicides than total homicides. But when drug homicides exceed total homicides for a year and a half as it did in Sinaloa, you know something is wrong.

Most homicides in Sinaloa are caused by firearms, and given the other errors in the mortality database it seems one should look if the discrepancy in the numbers lie in an excess of firearm deaths.

Here’s a table of the yearly vital statistics data on homicides, firearm accidents and drug war-related homicides:

YearAll HomicidesDrug-Related 
Homicides
Accidental 
Firearm Deaths
2004387NA61
2005450NA47
2006451NA26
2007387426204
20088691,084198
20091,4221,05936
20102,2771,81567†
† Preliminary estimates undercounted by about 6%


I highlighted the years where drug homicides exceeded all homicides. As you can see the anomalous years also had an extremely high number of firearm accidents (I used the External Cause of Injury Mortality Matrix for ICD-10 to code accidents by firearm)
In fact, there’s a very high correlation when looking at the unexplained drug homicides (the drug homicides left over when we substract the drug war-related homicides from the total homicides) and firearm accidents, but only during the anomalous years:
I don’t think there can be any doubt that at least some of deaths by firearm accident correspond to those in the drug-war related homicide database during 2007 and 2008. Now the question is whether they are really homicides or accidents.

It seems natural that firearm accidents would rise along with homicides as the drug war ratcheted up, since an escalation of violence would involve hiring newer, less experienced gunmen. As you’d expect the newer gunmen were not the brightest bulbs in the box. But even for dumb cartel gunmen I’m surprised by the sudden increase in accidents and the sudden drop after 2008. 

There’s also the fact that the police and government authorities are under much more pressure to bring down the number of homicides than the number of accidents. For all these reasons I think it is much more likely that at least some homicides were misclassified as accidents, and that this explains part of the discrepancy during 2007 and 2008.


As funny as I find this, it probably didn’t happen in Sinaloa in 2007 and 2008

What I’ll do to find out whether the deaths were accidents or homicides is to first ignore the classification specified in the mortality database for all firearm accidents from January 2007 to December 2008. Then use machine learning to see if the characteristics of the original firearm accidents (sex, age, place of occurrence, etc) are sufficient to separate them into homicides and accidents.

Homicides in Sinaloa tend to occur on public streets:
However, accidentals deaths tend to occur all over the place except for 2007 and 2008 when there were a lot accidents on public streets (surprise, surprise):

Once we ignore the firearm accidents that occurred in 2007 and 2008 there are very few accidentals deaths by firearm in Sinaloa, and since the more data the better, I used all 1,834 firearms accidents in the rest of Mexico (excluding Chiapas because its vital statistics are a mess) together with all homicides and accidents by firearm in Sinaloa (excluding 2007 and 2008 obviously) to train a classifier.

There are still some unresolved problems with this approach since the state of Baja California went from having 6 deaths by firearm in 2006, to 100 in 2007, to 6 again in 2008, in comparison homicides went from 303 in 2006 down to 243 in 2007. This could imply that Sinaloa is not the only state with mortality misclassifications and underlines just how difficult it is to get training data (I ignored any possible errors in the rest of the country for classifying the data)

I ended up using the age, sex, place of occurrence of the injury, year of death, marital status and whether an autopsy was perfomed as the variables (including their interactions) in a penalized logistic regression to separate homicides from accidents. To avoid overfitting I separeted 30% of the data into a test dataset.

These are the results of crossvalidating on the test database:
           Reference
Prediction Accident Homicide
  Accident      193       82
  Homicide      166      933
                                          
               Accuracy : 0.8195          
                 95% CI : (0.7981, 0.8395)
    No Information Rate : 0.7387          
    P-Value [Acc > NIR] : 8.951e-13       
                                          
                  Kappa : 0.4942          
 Mcnemar's Test P-Value : 1.360e-07       
                                          
            Sensitivity : 0.5376          
           Specificity : 0.9192         
         Pos Pred Value : 0.7018          
         Neg Pred Value : 0.8490          
             Prevalence : 0.2613          
         Detection Rate : 0.1405          
   Detection Prevalence : 0.2001          
                                          
       'Positive' Class : Accident  

Specificity measures the proportion of homicides which were correctly identified, which in this case is more important than misclassified accidents (sensitivity), since I’m trying to prove that the accidents are really homicides. With a specificity of 92% we can be pretty certain that a homicide classified as such is actually a homicide, but with a sensitivity of only 54% we might as well flip a coin when determining how certain we are that an accident is actually an accident. With such a low sensitivity we can consider the number of homicides in the table below as a sort of lower limit.

Year Imputed Intent Number of Deaths
2007 Accident 61
2007 Homicide 143
2008 Accident 20
2008 Homicide 178

The result of classifying the 204 and 198 firearm accidents in 2007 and 2008.

As you can see the data looks much more reasonable now, though in 2008 there were still more drug homicides than total homicides, which may have been because some homicides were registered as accidents by a cause other than firearm (in around 23% of accidents the cause is left unspecified). When it comes to violence in Mexico it seems we watch shadows projected on a cave’s wall.

Given that the sudden increase in homicides coincided with the end and start of a new year there may have been some sort of law or some legal means that made it ok to classify a homicide as an accident but that is just a matter of calling a tail a leg. But at this point I’m just speculating as to possible motives for recording a homicide as an accident, so I’ll stop.
I can’t confitm anything. There’s something to what you ask, but I can’t confirm anything. We have to perfom tests, you’ll have the data later. What we know is that it was an accident, but a very strange one because the body was dumped at the public hospital where he died.
No te puedo confirmar nada. Algo hay de lo que preguntas, pero no te puedo confirmar nada. Faltan pruebas periciales por realizar, después tendrás los datos. Lo que sabemos es que es un accidente, pero muy raro porque el cuerpo fue arrojado en el Seguro Social y ahí murió
P.S. The data and code are available from my GitHub

To leave a comment for the author, please follow the link and comment on their blog: Diego Valle's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)