The Bachelorette Ep. 1 – Every has its Thorn – Data Analysis in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We all love a good love story. In The Bachelor TV Franchise, millions gather to see who will be the lucky winner of the contestant’s heart. But what about those who don’t make it? What do they have in common? Please check out our dashboard (built with shiny) at https://stoltzmaniac.shinyapps.io/TheBacheloretteApp/.
Night 1 of The Bachelorette found 23 men continuing on to night 2, while 8 men were sent packing. Based on our highly scientific rating system here at Stoltzman Consulting, we found that Clare has a preferred type.
Let’s dig into the data.
Below, we see that Clare prefers men whose characteristics are Bro-y and Vanit-y while she is less interested in men who are there with characteristics of the “Right Reason-y” and “Kind Heart-y”. The “Drama-y” level does not appear to impact her decision making.
Clare also appears to prefer men who have a stronger social media presence. Those who have not been cut are shown in the “tall and slender” side versus those who were eliminated are depicted in the “short and stout” side of the chart below. Dale Moss had to be excluded from the chart due to the fact that his follower count was way higher than everyone else’s (over 180K) – he is still an active participant.
We are looking forward to seeing where this season goes. Apparently, it is the most dramatic season yet! Remember to checkout our live analytics dashboard at https://stoltzmaniac.shinyapps.io/TheBacheloretteApp/ to build models to predict the winner and see current Twitter and Instagram trends.
For those interested in the code (data available upon request, just visit the contact us section of this site).
library(tidyverse) GLOBAL_DATA = get_database_data() saveRDS(GLOBAL_DATA, 'GLOBAL_DATA.rds') # Contestant circular bar chart all_contestant_data = GLOBAL_DATA$contestant_data_raw suitor_data = all_contestant_data %>% filter(!instagram %in% c('tayshiaaa', 'chrisbharrison', 'clarecrawley')) %>% mutate(status = as.factor( case_when( end_episode > latest_episode ~ 'Active', TRUE ~ 'Eliminated' ) )) %>% select(status, everything()) coord_plot_data = suitor_data %>% select(status:`Right Reason-y`) %>% group_by(status) %>% pivot_longer(`Vanit-y`:`Right Reason-y`, names_to = "characteristic") %>% group_by(characteristic, status) %>% summarize(avg = mean(value), .groups = 'drop') coord_plot_data %>% ggplot(aes(x = characteristic)) + geom_col(aes( y = avg, col = status, fill = status), position = 'dodge') + geom_text(aes( y = avg, label = characteristic), data = coord_plot_data %>% filter(status == 'Active'), size = 4, position = position_stack(vjust = 1.4)) + coord_polar() + ggtitle('Preferred Suitor Characteristics') + theme_minimal() + scale_color_manual("legend", values = c("Active" = "pink", "Eliminated" = "darkgrey")) + scale_fill_manual("legend", values = c("Active" = "pink", "Eliminated" = "darkgrey")) + theme(legend.position = c(0.5, 0.95), legend.direction = "horizontal", legend.title = element_blank(), axis.title = element_blank(), axis.text.y = element_blank(), axis.text.x = element_blank(), plot.title = element_text( hjust = 0.5, vjust = -1)) # Contestant instagram insta_follower_data = GLOBAL_DATA$insta_followers_w_losers %>% drop_na() %>% mutate(relative_air_date = case_when( datetime <= '2020-10-12' ~ 'Pre Air', TRUE ~ 'Aired') ) %>% left_join( suitor_data, by = 'name' ) %>% drop_na() insta_follower_data %>% filter(relative_air_date == 'Pre Air', follower_count < 1e5) %>% ggplot(aes(x = status, y = follower_count, col = status, fill = status)) + geom_violin(alpha = 0.) + ggtitle('Pre-Air Instagram Followers', subtitle = "*Excludes Dale Moss") + theme_minimal() + scale_color_manual("legend", values = c("Active" = "pink", "Eliminated" = "darkgrey")) + scale_fill_manual("legend", values = c("Active" = "pink", "Eliminated" = "darkgrey")) + scale_y_continuous(label = scales::comma) + theme(legend.position = 'top', legend.direction = "horizontal", legend.title = element_blank(), axis.title = element_blank(), plot.title = element_text( hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.