Site icon R-bloggers

Fun with infochimps: Animated Blog Post Hit Map

[This article was first published on Zero Intelligence Agents » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In a few weeks I will be visiting Chicago, and JD Long—the organizer of the local R users group—has graciously invited me to give a presentation. Ostensibly, the presentation will be on my recently released infochimps package, so I thought it was a good time to start actually putting together some examples and documentation for the package.

If you have not visited the site in a week or so you will have missed my previous post on analyzing WikiLeaks data, which from the traffic and comments was at least somewhat controversial. Given this rare spotlight I thought it would be fun to use the infochimps API to map out the geo-location of everyone that visited the blog post over the last few days. Unfortunately, after nearly two years with the same web hosting service, only today did I realize that I was not capturing daily log files for my domain.

While the issue has been resolved; tragically, all of that data has been lost. In lieu of analyzing the logs from the last week, I am limited to only visualizing the traffic from today. Hopefully before my presentation in Chicago I will have another post that strikes a nerve deep within the Internet; but until then, I present an animated map hits from today to my “Why I will Not Analyze The New WikiLeaks Data.”

Animated Blog Post Hit Map from Drew Conway on Vimeo.

The timing of the hits is significantly sped up; each second of the animation representing roughly 9.5 minutes of blog post traffic. With the IP addresses of visitors to the blog post, I used the infochimps package to collect latitude and longitude coordinates for each hit, and then simply mapped these out in ggplot2 over time and created the animation with ffmpeg.

The sizes of the bubbles represent the number of concurrent hits from the same coordinate at a given second. As such, you will notice sudden bursts in some locations. I am far from a DNS expert, so I am sure someone will tell me how I am over counting certain IPs, but it is fun to watch the activity.

I will release the code and instruction for this as part of a general update to the infochimps documentation in the lead up to my trip to Chicago. In the meantime I am happy to answer any questions about this, or the package more generally.

To leave a comment for the author, please follow the link and comment on their blog: Zero Intelligence Agents » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.