Site icon R-bloggers

Television Trends as a Social Indicator

[This article was first published on R – NYC Data Science Academy Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Contributed by Emil Parikh. He is currently in the NYC Data Science Academy 12-week, full-time Data Science Bootcamp program taking place between January 9th to March 31st, 2017. This post is based on his second class project – Web Scraping.

Links:   GitHub   |   App

 

Introduction

There are various indicators in disciplines such as economics and politics that measure the state of different aspects of their fields. That is why—when events around the country in the past few years have caused people to question the state of the US and how surprised they are about “who this country is”—I am surprised there is no indicator that can tell us who we are and where we are going socially as a country; there are a collection of indicators that describe the social environment in terms of such things as poverty, obesity and suicide rates, but these largely describe outcomes and consequences rather than preferences and personality.

Spoiler Alert! A full solution to such a complicated task is beyond the scope of this project; a full solution would require multiple scraping projects and continued feedback from professionals in social psychology. I will address this again in the next steps section. Instead, I used this time to take a first step in building a social indicator by scraping and visualizing information about television shows.

 

Data Collection

I used scrapy and IMDbPY to gather television data from Wikipedia and IMDb respectively. There was some information I could only get from Wikipedia and some only from IMDb.

While show titles could be found in both, I needed to scrape them off of Wikipedia in order to

 

Screenshots of two Wikipedia pages I scraped TV show titles and URLs from:

 

 

Screenshots of a Wikipedia show page from which I retrieved information:

 

 

For fields common to both Wikipedia and IMDb such as genre and start/end date, I still retrieved their information from Wikipedia; Once the scraping was finished, I filled in any missing data by collecting the same information from IMDb along with IMDb rating and number of votes.

 

A sample of my Wikipedia TV show scraper

View the code on Gist.

 

Using IMDbPY to get information about TV shows using show titles gathered from Wikipedia as the search term:

View the code on Gist.

 

Visualization and Analysis

 In the app, I have visualizations on

This information is displayed for each year from the 1940s until 2016 by genre and by network.

Screenshots of some of the visualizations:

Count of new shows by genre from 1940s to 2016:

Count of new shows by network from 1940s to 2016:

 

What we can get out of the genre plots is that the networks and show creators believe that audiences want more comedies and reality shows (shows that tend to require less thinking). Dramas have not spiked up as much. While the shows created in these genres have been on a consistent rise, the number of shows created by the major networks has been on a decline since the mid-1980s. I will need to look into this further.

Next Steps

TV show data alone is not enough to answer “who are we as a society?”, especially without viewership data. Some future steps I would take to build upon this project are:

The post Television Trends as a Social Indicator appeared first on NYC Data Science Academy Blog.

To leave a comment for the author, please follow the link and comment on their blog: R – NYC Data Science Academy Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.