Use foursquare to locate a twitter user using R
[This article was first published on Stats and things, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve been doing some work with Twitter data. In much of this work, my life would be so much easier if we could geographically locate the origin of the tweets. There are some ways to do this using the twitter APIs. For example, if a user has geo-location turned on, you can get the precise lat-lng info for a specific tweet. Also, each user has the option to set a location in their profile. Using this free-form info, you can get an idea of where the user is located, but this is not relevant to each specific tweet, it’s a user attribute. So, a Chicago native traveling and tweeting in Florida will be a problem. Using this information, you can get some information for some people. But it’s not complete… so I thought about trying some other ways of location a twitter user.
My first crack at this was to see if a given twitter user is a foursquare user. Foursquare users use the service to check in to places for points or other purposes. Any how, using the foursquare API, you can retrieve a lat-lng pair for a given checkin. My idea was to look at a user’s tweet history to see if there are any fourquare links. Take these foursquare links and use the foursquare API to get lat-lng pairs. Then, cluster these points and choose the mid point of the largest cluster, that is, the cluster with the most points, as the guess for the twitter users location.
First, we start off using the twitteR package to download n of a users most recent tweets…
Then, I extract the links and resolve them using a link expander service. Once that is done, I can take the foursquare links, and bounce them off of the foursquare API to get a lat-lng pair. I made a function to do this. Note that you will need a foursquare API key saved in a file as noted in the function.
Now, you have a list of lat-lng pairs from foursquare. I use the Mclust to find the largest cluster. Then, using the points in that largest cluster, take the average lat and average lng to be the center of the cluster. That is my guess for where my twitter user of interest lives.
Next, I create an openstreets map to display this. The red dots represent the various lat-lng pairs from foursquare, and the blue dot is the cluster center.
The code can be found on github.
To leave a comment for the author, please follow the link and comment on their blog: Stats and things.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.