Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Getting More Granular
Where we’re going
In having noticed the crime rates heading up over the last few years, taking a better look seemed more important. I want to first look at “CRIME” before looking into “TRAFFIC” in the data set. It sounds more interesting and I hope the results don’t keep me up at night.
What we’ll do in this post
- Load the csv, format the data
- This will all be hidden and can be found in the previous post (Part 1)
- Look into apparent growth in crime rates from 2012 – 2014
- We’ll focus only on those that fit the “ISCRIME” definition and not “ISTRAFFIC”
Let’s dive in!
Exploration of Data
Data provided by https://www.denvergov.org/opendata/dataset/city-and-county-of-denver-crime
Data Import & Formatting – shown in prior post: Crime Analysis – Denver-Part 1
Looking at Crime Incidents by Year
df = data %>% filter(IS_CRIME==1) %>% filter(year!=max(year(date))) %>% group_by(year) %>% summarise(incidents=sum(IS_CRIME)) %>% arrange(year) p = ggplot(df,aes(x=year,y=incidents,label=incidents)) p + geom_bar(stat='identity') + geom_text(face='bold',size=6,col='white',vjust=1)+ ggtitle('Crime Volume by Year') + xlab('Year') + ylab('Incidents') + theme(plot.title = element_text(hjust = 0.5))
df = data %>% filter(IS_CRIME==1) %>% filter(year!=max(year(date))) %>% group_by(year) %>% summarise(incidents=sum(IS_CRIME)) %>% arrange(year) %>% mutate(year,YoYPercentageChange=round(100*(incidents-lag(incidents))/lag(incidents)),0) df = df[!is.na(df$YoYPercentageChange),] p = ggplot(df,aes(x=year,y=YoYPercentageChange,label=YoYPercentageChange)) p + geom_bar(stat='identity') + geom_text(face='bold',size=6,col='white',vjust=1)+ ggtitle('Crime Percentage Change Year-Over-Year') + xlab('Year') + ylab('YoY Incident % Change') + theme(plot.title = element_text(hjust = 0.5))
Observations
- Crime rose the most between 2013 and 2012 (39% increase)
- Crime increased each year after but at a decreasing rate
- Examine years 2012 – 2014 to see growth changes
Highest volume of “ISCRIME” types
Identify the offense by OFFENSECATEGORY_ID and exclude months we have not seen so far this year.
#Isolate Years 2012 - 2014 data = data[data$year <= 2014 & data$year >= 2012,] #Sum up all incidents IS_CRIME AND IS_TRAFFIC maxYear = max(data$year) maxMonthYTD = max(data$month[data$year==maxYear]) #Look into IS_CRIME only df = data %>% filter(IS_CRIME==1) %>% group_by(year,OFFENSE_CATEGORY_ID) %>% summarise(incidents=sum(IS_CRIME)) %>% arrange(desc(incidents)) p = ggplot(df,aes(x=factor(year),y=incidents,fill=year)) p + geom_bar(stat='identity') + ggtitle('Crime Incidents Reported by Year') + xlab('Year') + ylab('Incidents') + theme(plot.title = element_text(hjust = 0.5),legend.position = 'none') + guides(fill = guide_legend(title='Year')) + coord_flip() + facet_wrap(~OFFENSE_CATEGORY_ID,ncol=3)
Observations
It would appear as if “all-other-crimes” has moved the needle the most between 2012 – 2014. This is not a very specific category. It’s also worth noticing that “other-crimes-against-persons” has grown as well. Both of these leads to some speculation that perhaps these vague types of crimes started being reported during this period and perhaps hadn’t been documented before.
- Growth categories:
- “larceny”
- “drug-alcohol”
- “public-disorder”
- Declining categories:
- “theft-from-motor-vehicle”
- “robbery”
- “burglary”
Many of the other categories have a much lower volume of incidents. Growth is more difficult to see in visualizatoins for these cases.
Here’s a look at growth year-over-year:
df2 = df %>% group_by(OFFENSE_CATEGORY_ID) %>% arrange(OFFENSE_CATEGORY_ID,year) %>% mutate(year,YoYchange=round((incidents-lag(incidents))),0) %>% filter(year != 2012) p = ggplot(df2,aes(x=factor(year),y=YoYchange,fill=year,label=YoYchange)) p + geom_bar(stat='identity') + ggtitle('Change in Crime Incidents vs Previous Year') + xlab('Year') + ylab('YoY Change in Incidents') + theme(plot.title = element_text(hjust = 0.5),legend.position = 'none') + guides(fill = guide_legend(title='Year')) + coord_flip() + facet_wrap(~OFFENSE_CATEGORY_ID,ncol=3) + geom_text(hjust=0.5, size=5,col='red', face='bold')
Here’s a look at % growth year-over-year:
df2 = df %>% group_by(OFFENSE_CATEGORY_ID) %>% arrange(OFFENSE_CATEGORY_ID,year) %>% mutate(year,YoYchange=round(100*((incidents-lag(incidents))/lag(incidents))),0) %>% filter(year != 2012) p = ggplot(df2,aes(x=factor(year),y=YoYchange,fill=year,label=YoYchange)) p + geom_bar(stat='identity') + ggtitle('% Change in Crime Incidents vs Previous Year') + xlab('Year') + ylab('YoY % Change in Incidents') + theme(plot.title = element_text(hjust = 0.5),legend.position = 'none') + guides(fill = guide_legend(title='Year')) + coord_flip() + facet_wrap(~OFFENSE_CATEGORY_ID,ncol=3) + geom_text(hjust=1,col='red',size=5,face='bold')
Observations
- “all-other-crimes” is the outright leader in change in both volume and percentage growth year-over-year with an astonishing 380% increase between 2012 and 2013
- “drug-alcohol” grew by 173% between 2012 and 2013 but dropped down to only 27% growth the next year
- “murder” didn’t change too much in volume compared to everything else (swinging up 7 and down 10) but was a 19% growth and a 23% decline in 2013 and 2014 respectively
Final Thoughts (for now)
Due to the vague nature of the types of crimes which grew the most, I can’t determine exactly what happened in Denver during 2013. In the less vague crimes, “drug-alcohol” saw the largest increase. This was followed by “public-disorder” and perhaps there’s a relationship there. My assumption is that one may perhaps cause the other…
I’m still curious about the seasonality and month-to-month effects. Perhaps certain types of crimes are more common during certain times. I’m also very interested to see if a new population was perhaps added to the mix in 2013. If a certain part of Denver was added in 2013 that would certainly help to explain the situation.
What I’ll do in the next crime posts
- Look for patterns by location
- Lay out some visualizations on maps
- Try to identify areas with high volumes of traffic incidents (maybe I can avoid a ticket)
- Answer the question: What types of crimes have grown the most in the last 5 years?
Code used in this post is on my GitHub
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.