TV Shows on the “Big 3” Streaming Services
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
2020 has been a tough year, and I’ve been doing my best to keep busy (and distracted from all the insanity – both at the personal and worldwide levels). Earlier this year, I took a course in machine learning techniques and have been working on applying those techniques to work datasets, as well as fun sets through Kaggle.com.
Today, I thought I’d share another dataset I discovered through Kaggle: TV shows available on one or more streaming service (Netflix, Hulu, Prime, and Disney+). There are lots of fun things we could do with this dataset. Let’s start with some basic visualization and summarization.
setwd("~/Dropbox") library(tidyverse) ## ── Attaching packages ────────────────────────────────────────────────────────── tidyverse 1.3.0 ── ## ✓ ggplot2 3.3.0 ✓ purrr 0.3.4 ## ✓ tibble 3.0.0 ✓ dplyr 0.8.5 ## ✓ tidyr 1.0.2 ✓ stringr 1.4.0 ## ✓ readr 1.3.1 ✓ forcats 0.5.0 ## ── Conflicts ───────────────────────────────────────────────────────────── tidyverse_conflicts() ── ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() Shows <- read_csv("tv_shows.csv") ## Warning: Missing column names filled in: 'X1' [1] ## Parsed with column specification: ## cols( ## X1 = col_double(), ## Title = col_character(), ## Year = col_double(), ## Age = col_character(), ## IMDb = col_double(), ## `Rotten Tomatoes` = col_character(), ## Netflix = col_double(), ## Hulu = col_double(), ## `Prime Video` = col_double(), ## `Disney+` = col_double(), ## type = col_double() ## )
First, we can do some basic summaries, such as how many shows in the dataset are on each of the streaming services.
Counts <- Shows %>% summarise(Netflix = sum(Netflix), Hulu = sum(Hulu), Prime = sum(`Prime Video`), Disney = sum(`Disney+`)) %>% pivot_longer(cols = Netflix:Disney, names_to = "Service", values_to = "Count") Counts %>% ggplot(aes(Service,Count)) + geom_col()
The biggest selling point of Disney+ is to watch their movies, though the few TV shows they offer can't really be viewed elsewhere (e.g., The Mandalorian). For the sake of simplicity, we'll drop Disney+, and focus on the big 3 services for TV shows.
The dataset also contains an indicator of recommended age, which we can plot.
Shows <- Shows %>% mutate(Age = factor(Age, labels = c("all", "7+", "13+", "16+", "18+"), ordered = TRUE)) Shows %>% ggplot(aes(Age)) + geom_bar()