Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
McDonald’s is a nostalgic component of America ???????? and a pioneer of fast food operations and real estate ventures, as depicted in the 2016 film, The Founder, about Ray Kroc. As a kid I traveled to different McDonald’s across the east coast and noticed a difference in the classic hamburger preparation for adding mustard (i.e. in Maryland and not in Upstate New York). After some Google research, I noticed others had documented the regional differences in the use of mustard and but no aggregated data set existed detailing which McDonald’s added mustard to their hamburgers.
I hypothesized that these deviations in food prep could be identified from yelp.com reviews. The process below explains the approaches I took to gather data from the web with the yelp API and the development of a shiny
web application which detects string patterns in reviews for the keyword ‘mustard’ for a specific McDonald’s.
API Process
This script highly references Jenny Bryan’s yelpr
example!
library(yelpr) # devtools::install_github("jennybc/ryelp") library(httr) library(stringr) library(purrr) # 1. Create an application on the [Yelp developers site](https://www.yelp.com/developers/v3/manage_app) and agree to the Terms and aggreements ## Set your credentials as environment variables. Sys.setenv(YELP_CLIENT_ID='**************') Sys.setenv(YELP_SECRET='*****************************') # 2. search for businesses by creating an app yelp_app <- oauth_app("yelp", key = Sys.getenv("YELP_CLIENT_ID"), secret = Sys.getenv("YELP_SECRET")) # authenticate an endpoint ## https://www.yelp.com/developers/documentation/v3/authentication yelp_endpoint <- oauth_endpoint(NULL, authorize = "https://api.yelp.com/oauth2/token", access = "https://api.yelp.com/oauth2/token") # 3. Get an access token: Just enter anything for the authorization code when prompted in the Console of RStudio token <- oauth2.0_token(yelp_endpoint, yelp_app, user_params = list(grant_type = "client_credentials"), use_oob = T) # make this arg TRUE when interactive # 4. Create a url to make calls to the business search endpoint: The parts of the url include the endpoint and the query search parameters after the **?** (url <- modify_url("https://api.yelp.com", path = c("v3", "businesses", "search"), query = list(term = "McDonalds", location = "Hartford, CT", limit = 10))) # 5. Retrieve info from the server with the `GET` verb: HTTP response verbs enable the client to send us back data on: status, headers, and body/content. Available verbs include **`GET`ting** data from the server, **`POST`ing** new data to the server, **`PUT`** new data to update a partial record and **`DELETE`ing** data. response1 <- GET(url, config(token = token)) # was this api request successful? ## HTTP status codes consist of 3 digit numeric codes for status (1xx is information, 2xx is success, 3xx is redirection, 4xx is client error, 5xx server error). http_status(response1) # what type of format does the data come back with? response1$headers$`content-type` # 6. Return some content with geolocation data, business info & categories ct2 <- content(response1) ## create an object with resturant name and id for further calls biz_info <- ct2$businesses %>% map_df(`[`, c("name", "id", "phone", "review_count")) biz_info %>% knitr::kable() # 7. Get business reviews: After getting a specific McDonald's `id` restructure the url as an individual value and secondly creating a function to create a data.frame with urls for each business from the search endpoint. url_id <- modify_url("https://api.yelp.com", path = c("v3", "businesses","mcdonalds-glastonbury", "reviews"), query = list( locale = "en_US")) # 8. Retrieve response data on up to 3 reviews for the specific McDonald's response2 <- GET(url_id, config(token = token)) content2 <- content(response2) # Detect for string of 'mustard' content2$reviews %>% map_df(`[`, c("text")) %>% str_detect("mustard")
The purrr
version to check multiple restaurant text reviews for the string ‘mustard’.
# create a function to structure the urls with the business id url_id_f <- function(id) { modify_url("https://api.yelp.com", path = c("v3", "businesses", id, "reviews"), query = list( locale = "en_US")) } # create a df which maps the url function of all the restaurants biz_reviews <- data.frame() biz_reviews <- map_chr(biz_info$id, url_id_f) %>% data.frame(url = .) biz_reviews$url <- as.character(biz_reviews$url) # Get each url for the request response3 <- map(biz_reviews$url, GET, config(token = token)) response3 %>% map_df(`[`, "status_code") == 200 # loop through each restaurant's 3 reviews and extract the text and detect the presence of the string 'mustard' for (idx in 1:length(response3)) { mcd <- response3[[idx]] ct <- content(mcd) print(ct) result <- ct$reviews %>% map_df(`[`, c("text")) %>% str_detect("mustard") print(result) }
Learnings & Gotchas
The non-premium API access only includes up to 3 reviews and only a sample of the full text, leaving obvious gaps when trying to detect the keyword ‘mustard’ and contingent on enough reviews which details ???? preparation.
In trying to create and publish a shiny
application that wraps this code, I came up with errors given that OAuth2.0 grants access to users ???? and not applications ????. However here is a screenshot of the script above developed into an interactive shiny
application to search for any [city, state] and the gist of the code if your interested in running a local version.
The name of this shiny
app is a nod to Silicon Valley’s Not Hotdog application.
Cover Image source: https://www.mcdonalds.com/us/en-us/about-us.html
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.