Google Insights and RCurl
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Google Insights is nifty. If you’re logged in to your Google account, you can download the results as a CSV file. This is straightforward if you’re using a browser; if you’re trying to retrieve the results of queries using R, however, things get more complicated.
The following code retrieves the results of a Google Insights search for “Sarah Palin” as a data.frame. It uses the RCurl package to do all of the hard work.
username <- "[email protected]" password <- "password_here" loginURL <- "https://accounts.google.com/accounts/ServiceLogin" authenticateURL <- "https://accounts.google.com/accounts/ServiceLoginAuth" require(RCurl) ch <- getCurlHandle() curlSetOpt(curl = ch, ssl.verifypeer = FALSE, useragent = "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13", timeout = 60, followlocation = TRUE, cookiejar = "./cookies", cookiefile = "./cookies") ## do Google Account login loginPage <- getURL(loginURL, curl = ch) require(stringr) galx.match <- str_extract(string = loginPage, pattern = ignore.case('name="GALX"\\s*value="([^"]+)"')) galx <- str_replace(string = galx.match, pattern = ignore.case('name="GALX"\\s*value="([^"]+)"'), replacement = "\\1") authenticatePage <- postForm(authenticateURL, .params = list(Email = username, Passwd = password, GALX = galx), curl = ch) ## get Google Insights results CSV insightsURL <- "http://www.google.com/insights/search/overviewReport" resultsText <- getForm(insightsURL, .params = list(q = "Sarah Palin", cmpt = "q", content = 1, export = 1), curl = ch) if(isTRUE(unname(attr(resultsText, "Content-Type")[1] == "text/csv"))) { ## got CSV file ## create temporary connection from results tt <- textConnection(resultsText) resultsCSV <- read.csv(tt, header = FALSE) ## close connection close(tt) } else { ## something went wrong ## probably need to log in again? }
download ‘Google Insights.R’ from gist.github.com
I don’t have much else to say about this, but I hope that it will be helpful to someone.
You can change the query to incorporate geographic restrictions or such by adding the parameters that appear in the URL when you change your search through the Google Insights web search; for instance, a basic search for “QUERY” gives URL http://www.google.com/insights/search/#q=QUERY&cmpt=q whereas the same search restricted to the state of New York has URL http://www.google.com/insights/search/#q=QUERY&geo=US-NY&cmpt=q; the added parameter is “geo=US-NY”. To incorporate this into the script, change
resultsText <- getForm(insightsURL, .params = list(q = "Sarah Palin", cmpt = "q", content = 1, export = 1), curl = ch)
to have the additional parameter in the .params list:
resultsText <- getForm(insightsURL, .params = list(q = "Sarah Palin", cmpt = "q", geo = "US-NY", content = 1, export = 1), curl = ch)
[Updated 2012-04-24]
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.