Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
[UPDATE 3/28/2011: Fixed an enormous bug in the R code.]
I’m trying to collect data sets that showcase how the classical statistical distributions appear in modern contexts. I’ve already got some data that shows how the gamma distribution appears in video game scores, and now I’m hoping to find an example where the exponential distribution shows up. I think that checkins for Foursquare might be a good place to start.
To test this intuition, I’m hoping to collect some pilot data. Below you’ll find some code that you can use to help me gather data.
First, there’s a shell script to gather your own checkin data from FourSquare. To use this script, you need to substitute your e-mail address where EMAIL appears and your password where PASSWORD appears in the code below:
1 | curl -u 'EMAIL:PASSWORD' https://api.foursquare.com/v1/history?l=250 > checkin_history.xml |
And second there’s an R script you can use to preprocess the data from the last step into a nice format before sending it to me. If you’re not an R user, you can easily skip this step and send the data you have in its raw XML format.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | library('plyr') library('XML') filename <- 'checkin_history.xml' tree <- xmlTreeParse(filename, asTree = TRUE) checkins <- tree$doc$children$checkins venue.names <- c() latitudes <- c() longitudes <- c() for (i in 1:length(checkins)) { venue.names <- c(venue.names, as.character(checkins[i]$checkin[['venue']][['name']][['text']])[6]) latitudes <- c(latitudes, as.numeric(unclass(checkins[i]$checkin[['venue']][['geolat']][['text']])$value)) longitudes <- c(longitudes, as.numeric(unclass(checkins[i]$checkin[['venue']][['geolong']][['text']])$value)) } checkin.data <- data.frame(Venue = factor(venue.names), Latitude = as.numeric(latitudes), Longitude = as.numeric(longitudes)) count.data <- ddply(checkin.data, 'Venue', nrow) names(count.data) <- c('Venue', 'TotalCheckins') write.csv(count.data, file = 'count_data.csv', row.names = FALSE) |
After running these two pieces of code, the output file, count_data.csv
, should look like this:
Venue | TotalCheckins |
---|---|
“Brooklyn Boulders” | 13 |
… | … |
Once you’ve got data, you can send it to me by e-mail at jmw@johnmyleswhite.com.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.