TV Shows on the “Big 3” Streaming Services
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
2020 has been a tough year, and I’ve been doing my best to keep busy (and distracted from all the insanity – both at the personal and worldwide levels). Earlier this year, I took a course in machine learning techniques and have been working on applying those techniques to work datasets, as well as fun sets through Kaggle.com.
Today, I thought I’d share another dataset I discovered through Kaggle: TV shows available on one or more streaming service (Netflix, Hulu, Prime, and Disney+). There are lots of fun things we could do with this dataset. Let’s start with some basic visualization and summarization.
setwd("~/Dropbox")
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0 ✓ purrr 0.3.4
## ✓ tibble 3.0.0 ✓ dplyr 0.8.5
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ───────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Shows <- read_csv("tv_shows.csv")
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## X1 = col_double(),
## Title = col_character(),
## Year = col_double(),
## Age = col_character(),
## IMDb = col_double(),
## `Rotten Tomatoes` = col_character(),
## Netflix = col_double(),
## Hulu = col_double(),
## `Prime Video` = col_double(),
## `Disney+` = col_double(),
## type = col_double()
## )
First, we can do some basic summaries, such as how many shows in the dataset are on each of the streaming services.
Counts <- Shows %>%
summarise(Netflix = sum(Netflix),
Hulu = sum(Hulu),
Prime = sum(`Prime Video`),
Disney = sum(`Disney+`)) %>%
pivot_longer(cols = Netflix:Disney,
names_to = "Service",
values_to = "Count")
Counts %>%
ggplot(aes(Service,Count)) +
geom_col()
The biggest selling point of Disney+ is to watch their movies, though the few TV shows they offer can't really be viewed elsewhere (e.g., The Mandalorian). For the sake of simplicity, we'll drop Disney+, and focus on the big 3 services for TV shows.
The dataset also contains an indicator of recommended age, which we can plot.
Shows <- Shows %>%
mutate(Age = factor(Age,
labels = c("all",
"7+",
"13+",
"16+",
"18+"),
ordered = TRUE))
Shows %>%
ggplot(aes(Age)) +
geom_bar()



