Bechdel Test

[This article was first published on Ronan's #TidyTuesday blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Setup

Loading the R libraries and data set.

# Loading libraries
library(gganimate)
library(tidytuesdayR)
library(tidyverse)
library(tidytext)
library(forcats)

# Loading the Bechdel Test data set
tt <- tt_load("2021-03-09")

    Downloading file 1 of 2: `raw_bechdel.csv`
    Downloading file 2 of 2: `movies.csv`

Illustrating the change in Bechdel Test results over time

The first graph we want to create is an animation showing the change in Bechdel Test results over time. This animation shows the percentage of films each year (from 1940 to 2020) that meet different criteria of the Bechdel Test.

# Changing "rating" from a character to a factor variable
tt$raw_bechdel$rating <- as.factor(tt$raw_bechdel$rating)

# Levels of the "rating" variable
levels(tt$raw_bechdel$rating)
[1] "0" "1" "2" "3"
# Renaming the levels of the "rating" variable
levels(tt$raw_bechdel$rating) <- c("Unscored", "It has two women...",
                                   "...who talk to each other...",
                                   "...about something besides a man")

# Counting the number of films with each Bechdel test rating per year
ratings_by_year <- tt$raw_bechdel %>%
  group_by(year) %>%
  count(year, rating)

# Counting the total number of films in each year
film_count_by_year <- ratings_by_year %>%
  group_by(year) %>%
  summarise(total = sum(n))

# Adding the annual film count to the Bechdel test rating count per year
ratings_by_year <- left_join(ratings_by_year, film_count_by_year)
rmarkdown::paged_table(ratings_by_year)
# Changing "year" to an integer variable
ratings_by_year$year <- as.integer(ratings_by_year$year)

# Creating an animation summarising the Bechdel test results from 1940 to 2020
p <- ratings_by_year %>%
  ggplot(aes(x = fct_rev(rating), y = (n/total), group = rating, fill = rating)) +
  geom_bar(stat = "identity") +
  scale_y_continuous(labels = scales::percent_format(scale = 100)) +
  coord_flip() +
  theme_bw() +
  theme(legend.position = "none") +
  transition_time(year, range = c(1940L, 2020L)) +
  labs(x = "Bechdel Test result", y = "Percentage of films",
       subtitle = "Year: {frame_time}",
       title = "Bechdel Test results over time") +
  ease_aes("cubic-in-out")

# Rendering the animation as a .gif
animate(p, nframes = 400, fps = 20, renderer = magick_renderer())
The percentage of films released each year with various Bechdel test results

(#fig:figure1)The percentage of films released each year with various Bechdel test results

Plotting the directors most likely to pass/fail the Bechdel Test

In this section, a plot is produced that shows the directors most likely to pass/fail the Bechdel Test. This is done by…

  • taking all the directors in tt$movies as a corpus
  • splitting that corpus into two documents: Bechdel Test passes and failures
  • calculating tf-idf to find significant directors in each document
# Selecting directors and their Bechdel pass/fail results
results_by_director <- tt$movies %>%
  select(director, binary) %>%
  filter(!is.na(director)) %>%
  separate_rows(director, sep = ", ")

# Changing "binary" to a factor variable
results_by_director$binary <- as.factor(results_by_director$binary)

# Renaming the levels of the "binary" factor
levels(results_by_director$binary) <- c("Bechdel Test Failed",
                                        "Bechdel Test Passed")

# Counting the number of times each director passes/fails the Bechdel test
results_by_director <- results_by_director %>%
  count(binary, director, sort = TRUE)

# Counting the number of times the Bechdel test has been passed/failed
total_results <- results_by_director %>%
  group_by(binary) %>%
  summarise(total = sum(n))

# Adding the total Bechdel result counts to "results_by_director"
results_by_director <- left_join(results_by_director, total_results)

# Adding tf-idf values to "results_by_director"
results_by_director <- results_by_director %>%
  bind_tf_idf(director, binary, n)

rmarkdown::paged_table(results_by_director)

To leave a comment for the author, please follow the link and comment on their blog: Ronan's #TidyTuesday blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)