Computing kook density in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Do you ever see strange lights in the sky? Do you wonder what really goes on in Area 51? Would you like to use your R hacking skills to get to the bottom of the whole UFO conspiracy? Of course, you would!
UFO data from infochimps is the focus of a data munging exercise in Chapter 1 of Machine Learning for Hackers by Drew Conway and John Myles White, two social scientists with a penchant for statistical computing.
The exercise starts with slightly messy data, proceeds through cleaning up some dates. I think I slightly improved on the code given in the book. Have a look (gist:3775873) and see if you agree.
Dividing the data up by state (for sightings in the US), I noticed something funny. My home state of Washington has a lot of UFO sightings. Normalizing by population, this becomes even more pronounced.
I learned a neat trick from the chapter. The transform function helps to compute derived fields in a data.frame. I use transform to compute UFO sightings per capita, after merging in population data by state from the 2000 census.
sightings.by.state <- transform( sightings.by.state, state=state, state.name=name, sightings=sightings, sightings.per.cap=sightings/pop)
Creating the plot above, with a pile of ggplot code, we see that Washington state really is off the deep end when it comes to UFO sightings. Our northwest neighbors in Oregon come in second. I asked a couple fellow Washington residents what they thought. The first reasonably conjectured a relationship to the number of air bases. The second Washingtonian gave the explanation I favor: "High kook density".
If you'd like to the data, it's from Chapter 1 of Machine Learning for Hackers. Data and code can be found in John Myles White's github repo.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.