Analyze Instagram with R
[This article was first published on ThinkToStart » R Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This tutorial will show you how you create an Instagram app, create an authentication process with R and get data via the Instagram API.
There is no R package for this yet so we have to configure the authentication and data download process on our own. But Instagram offers a pretty good documented API and uses oAuth 2 which makes it easy to use with R and the httr package for example.
Authentication
The place to start for everybody who wants to work with the Instagram API is http://instagram.com/developer/
Here you can find all the information you need and also manage your apps.
So click on „Register Your Application“ and go through the login.
On the next screen you can set the parameters for your app. Choose an application name ,write a small description of what your app will be about and add a webiste
Then you have to enter an OAuth redirect URI. To choose it go to your R console and execute following code:
require(httr) full_url <- oauth_callback() full_url <- gsub("(.*localhost:[0-9]{1,5}/).*", x=full_url, replacement="\1") print(full_url)
This will show you the preferred callback URI for httr. Copy this URL and paste it in your app settings.
This is how my settings look like:
After clicking on „Register“ you will be redirected your app authentication details we will need for our analysis.
In R we have to define 4 variables:
app_name <- "ThinkToStartTest" client_id <- "XXX" client_secret <- "XXX" scope = "basic"
The first 3 you get from your app settings. The third on scope is basically the level of authorization you want to get. Basic is enough to download data like likes or comments. If you actually want to post something to Instagram you need another scope. You can find more information on the Instagram developer page about that.
Then we create our Instagram in R for the httr package. This is the app we will use to connect to the API. To do so we have to provide the access points.
instagram <- oauth_endpoint( authorize = "https://api.instagram.com/oauth/authorize", access = "https://api.instagram.com/oauth/access_token") myapp <- oauth_app(app_name, client_id, client_secret)
In the next step we do the authentication
ig_oauth <- oauth2.0_token(instagram, myapp,scope="basic", type = "application/x-www-form-urlencoded",cache=FALSE) tmp <- strsplit(toString(names(ig_oauth$credentials)), '"') token <- tmp[[1]][4]
Now your browser should open and ask you to give permission to the app. After you returned to R you should have received your access token.
Our analysis starts on the basis of a username. In my example I will use „therock“ as it is the account of actor Dwayne Johnson.
username <- “therock”
But most of the functions of the Instagram API work with the user id and we don´t have it now.
So we use the search function to get information about the user with the username „therock“.
user_info <- fromJSON(getURL(paste('https://api.instagram.com/v1/users/search?q=',username,'&access_token=',token,sep="")),unexpected.escape = "keep")
This returns a list mostly with around 50 people. But we just extract the first returned user and compare if this is the user we were looking for as the first user will always be the one with 100% the username we searched for if it exists.
received_profile <- user_info$data[[1]]
Analyze Instagram with R
Now that we have the user id we can start getting the post data.
user_id <- received_profile$id #Get recent media (20 pictures) media <- fromJSON(getURL(paste('https://api.instagram.com/v1/users/',user_id,'/media/recent/?access_token=',token,sep="")))
This returns the recent 20 pictures of the user we will use for our analysis. We go through them with a for loop to extract the count of likes and comments and the date and time the photo was posted.
Instagram uses UNIX timestamps as their date. So we have to convert it to make it readable.
df = data.frame(no = 1:length(media$data)) for(i in 1:length(media$data)) { #comments df$comments[i] <-media$data[[i]]$comments$count #likes: df$likes[i] <- media$data[[i]]$likes$count #date df$date[i] <- toString(as.POSIXct(as.numeric(media$data[[i]]$created_time), origin="1970-01-01")) }
Visualization
Now we can visualize the data. I will use the rCharts package to so. Of course you can also use ggplot2 or whatever package you like.
require(rCharts) m1 <- mPlot(x = "date", y = c("likes", "comments"), type = "Line", data = df)
You can find the complete code of the tutorial on my github account:
The post Analyze Instagram with R appeared first on ThinkToStart.
To leave a comment for the author, please follow the link and comment on their blog: ThinkToStart » R Tutorials.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.