Site icon R-bloggers

To Eat, or Not to Eat…WHERE is the question

[This article was first published on R – NYC Data Science Academy Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

According to NYC Health, “Each year, thousands of New York City residents become sick from consuming foods or drinks that are contaminated with harmful bacteria, viruses or parasites”[1]. Since a common source of food poisoning stems from eating out at insalubrious restaurants, I decided to create a Shiny App where users can easily search and access information on restaurants that were temporary closed due to hazardous sanitary violations in the past 5 years. The code for this project can be found on Github. My Shiny can be found here.

I. Introduction

NYC’s Department of Health and Mental Hygiene (DOHMH) conducts unannounced inspections of restaurants at least once a year to check for a variety of issues, such as compliance in food handling, food temperature, personal hygiene, and vermin control. According to its scoring system, each violation of a regulation gets a restaurant a certain number of points, which are then added to an overall score at the end of the inspection. The higher the score, the worst a restaurant performs. Each score is converted to a letter grade (e.g. A/B/C), which must be prominently posted at the entrance of a restaurant.

For my Shiny project, I was interested in investigating restaurant closures within NYC from 2013 to 2017. For example, some of my initial questions were:

 

II. Dataset & Cleanup

The Restaurant Inspection Results dataset is provided by the NYC DOHMH and can be found at NYC Open Data. It consists of about 400,000 entries from inspections conducted between 2012 and 2017, and includes at a high level information on restaurants location, cuisine, inspection dates and individual violations.

A version of January 17,2016 of the dataset was used for this shiny app. Some initial cleanup was performed on the data, including: removing rows with no scores and negative scores; changing format of dates; fixing borough naming issues; and shortening text values.

The code below was used to generate 3 new columns for the analysis of restaurant closures:

View the code on Gist.

Information on latitude and longitude was also added to the dataset for the map feature of my shiny app through the geocode function found at the ggmap library. For full details on my code to clean up this dataset, please refer to my github account.

 

III. Analysis

A. Overall Distribution of Grades over the years

Since letter grades are what most NYC residents are familiar with, I first wanted to visualize the breakdown of restaurant grades and how that proportion has changed over the years. Looking at all 5 Boroughs combined, a few observations can be made:

The distribution of grades for each Borough found on my Shiny App also shows that, in terms of grades, there is no much differentiation across neighborhoods with most restaurants receiving A grades

 

B. Proportion of closures

As most restaurants performed well at the sanitary inspections, I decided to focus my analysis on restaurant closures, as those were places that, independent of score received, committed some kind of sanitary violation that posed a serious threat to the health of customers.

B.1 Proportion of closures by Borough

The graph below provides some interesting insights:

 

B.2. Proportion of closures by Borough and Cuisine

Since not much differentiation could be seen at the Borough level, I decided to take my analysis a level deeper, and investigate the proportion of closures by Borough and Cuisine.

The graph below displays the results of my analysis for the top 6 worst performers with highest closure proportions. On my Shiny App, users have the option to view more or less bars if they wish. This segmentation by both Borough and Cuisine offered some interesting insights:

 

C. Length of closure vs Inspection Score

Another question I was interested in was whether inspection scores had any relationship with length of closure. Even though around 85% of restaurants only took less than a week to reopen, I was still expecting restaurants with a higher score to take a few more days to fix its problems and reopen.

The boxplot below shows the distribution of scores by closure length. However, there was no apparent relationship between the two variables aforementioned, with mean of scores around 46 across different categories of length of closure

 

IV. Conclusion

Findings

Further Improvements/Analysis:

 

V. Other Features of the Restaurant Closures Shiny App

Aside from the graphs mentioned on this blog post, my shiny app also includes a map where one can find further details and location of restaurants shut down after poor performance in sanitary inspections within the past five years. There is also a tab labeled “Overview of All Restaurants”, where users can check the distribution of scores by Borough and find a Heat Map which displays the distribution of scores by Borough and Cuisine.

The post To Eat, or Not to Eat…WHERE is the question appeared first on NYC Data Science Academy Blog.

To leave a comment for the author, please follow the link and comment on their blog: R – NYC Data Science Academy Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.