Normalized Frequency of Terrorism in the US
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve been using the Global Terrorism Database a lot lately so I decided to share an interesting plot I made with the data.
The GTD provides over 100,000 observations of terrorist incidents between 1970 and 2011. Of these, there are about 2400 observations in the USA. While this is not a large number, the graph still provides some interesting and intuitive results.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | ## Load libraries library(ggplot2) library(plyr) library(maps) library(stringr) ## Load terrorism data gtd.data <- read.csv("gtd.csv", stringsAsFactors = F) ## ## Begin USA heatmap plot ## ## Subset data to only include terrorist attacks in the USA gtd.usa <- subset(gtd.data, country_txt == "United States") ## Clean provstate column gtd.usa$provstate <- str_replace(gtd.usa$provstate, "(U.S. State)", "") gtd.usa$provstate <- str_replace(gtd.usa$provstate, "[(]", "") gtd.usa$provstate <- str_replace(gtd.usa$provstate, "[)]", "") ## Trim whitespaces gtd.usa$provstate <- str_trim(gtd.usa$provstate) ## Load US state population data populations <- read.csv("states.csv") ## Create counts of terrorist activity in each state counts <- count(gtd.usa, "provstate") ## Merge the populations dataset with the counts dataset gtd.pop.merge <- merge(counts, populations, by.x = "provstate", by.y = "Name") ## Create normalized terrorism frequency by dividing frequency ## by the population of the state gtd.pop.merge <- mutate(gtd.pop.merge, normal = freq / CENSUS2010POP) gtd.pop.merge$normal <- log10(gtd.pop.merge$normal) gtd.pop.merge$provstate <- tolower(gtd.pop.merge$provstate) names(gtd.pop.merge)[1] <- "region" ## Load US state data states <- map_data("state") ## Merge the map data with our previous dataset merged <- merge(states, gtd.pop.merge, sort = FALSE, by = "region") ## Plot the heatmap g <- ggplot(merged) + geom_polygon(aes(x = long, y = lat, group = group, fill = normal)) g <- g + scale_fill_gradient(low = "lightgreen", high = "blue") g <- g + theme_bw() + labs(fill = "Normalized Frequency of Terrorism") + theme(legend.position = "bottom") g <- g + xlab(NULL) + ylab(NULL) g <- g + theme(panel.grid.minor=element_blank(), panel.grid.major=element_blank()) g <- g + theme(axis.text.x = element_blank(), axis.text.y = element_blank()) g <- g + ggtitle("Normalized Frequency of Terrorism in the USA") g <- g + scale_x_continuous(breaks = NULL) + scale_y_continuous(breaks = NULL) g |
In order to obtain meaningful results, rather than simply plot the number of terrorist incidents per state, I divided each state’s count by the 2010 state population. I know that this is not entirely correct as population levels have fluctuated (with respect to one another) from 1970-2011 but this was fine for my purposes. I noticed some clustering in the frequencies of terrorist attacks so I took a log10 transform of those numbers to spread the numbers out more smoothly.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.