Alone R package: Datasets from the survival TV series
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I have been watching the survival TV series ‘Alone,’ where 10 survivalists are dropped in an extremely remote area and must fend for themselves. I am super impressed by their skills, endurance, and mental fortitude. To last 100 days in the Arctic winter living off the land is truly impressive.
True to form, I’ve collected the data and I am sharing it here in the {alone} R package.
It is a collection of datasets about the TV series in a tidy format. Included in the package are 4 datasets
survivalists
loadouts
episodes
seasons
For non-Rstats users here is the link to the Google sheets doc.
Installation
Install from CRAN:
install.packages("alone")
Install from Github:
devtools::install_github("doehm/alone")
Datasets
survivalists
A data frame of survivalists across all 9 seasons detailing name and demographics, location and profession, result, days lasted, reasons for tapping out (detailed and categorised), and page URL.
Dataset features:
season
: The season numbername
: Name of the survivalistage
: Age of survivalistgender
: Gendercity
: Citystate
: Statecountry
: Countryresult
: Place the survivalist finished in the seasondays_lasted
: The number of days lasted in the game before tapping out or winningmedically_evacuated
: Logical. If the survivalist was medically evacuated from the gamereason_tapped_out
: The reason the survivalist tapped out of the game.NA
means they were the winner. Reason being that technically if they won they never tapped out.reason_category
: A simplified category of the reason for tapping outteam
: The team they were associated with (only for season 4)day_linked_up
: Day the team members linked up (only for season 4)profession
: Professionurl
: URL of cast page on the history channel website. Prefix URL with https://www.history.com/shows/alone/cast
As an example, use this dataset to compare the number of days survived for both men and women.
library(tidyverse) df <- expand_grid( days_lasted = 0:max(survivalists$days_lasted), gender = unique(survivalists$gender) ) |> left_join( survivalists |> count(days_lasted, gender), by = c("days_lasted", "gender") ) |> left_join( survivalists |> count(gender, name = "N"), by = "gender" ) |> group_by(gender) |> mutate( n = replace_na(n, 0), n_lasted = N-cumsum(n), p = n_lasted/N ) # Kaplan-Meier survival curves # code is simplified and plot won't match below df |> ggplot(aes(days_lasted, p, colour = gender)) + geom_line() # boxplots survivalists |> ggplot(aes(days_lasted, fill = gender)) + geom_boxplot(alpha = 0.5) + geom_jitter(width = 0.2, pch = 1, size = 3) + theme_minimal()
While there is yet to be a female winner, there is some evidence to suggest that women, on average, women survive longer than men. Although, we should investigate this further since in the first season there are a lot of early taps and no women.
loadouts
The rules allow each survivalist to take 10 items with them. This dataset includes information on each survivalist’s loadout. It has detailed item descriptions and a simplified version for easier aggregation and analysis.
Dataset features:
version
: Country code for the version of the showseason
: The season numbername
: Name of the survivalistitem_number
: Item numberitem_detailed
: Detailed loadout item descriptionitem
: Loadout item. Simplified for aggregation
library(forcats) loadouts |> count(item) |> mutate(item = fct_reorder(item, n, max)) |> ggplot(aes(item, n)) + geom_col() + geom_text(aes(item, n + 3, label = n), family = ft, size = 12, colour = txt) + coord_flip()
episodes
This dataset contains details of each episode including the title, number of viewers, beginning quote, and IMDb rating. New episodes are added at the end of future seasons.
Dataset features:
version
: Country code for the version of the showseason
: The season numberepisode_number_overall
: Episode number across seasonsepisode
: Episode numbertitle
: Episode titleair_date
: Date the episode originally airedviewers
: Number of viewers in the US (millions)quote
: The beginning quoteauthor
: Author of the beginning quoteimdb_rating
: IMDb rating of the episoden_ratings
: Number of ratings given for the episode
seasons
The season summary dataset includes location, latitude and longitude, and other season-level information. It includes the date of drop-off where the information exists.
Dataset features:
version
: Country code for the version of the showseason
: The season numberlocation
: Locationcountry
: Countryn_survivors
: Number of survivalists in the season. In season 4 there were 7 teams of 2.lat
: Latitudelon
: Longitudedate_drop_off
: The date the survivalists were dropped off
References
If there is any data you would like to include please get in touch.
- History: https://www.history.com/shows/alone/cast
- Wikipedia: https://en.wikipedia.org/wiki/Alone_(TV_series)
- Wikipedia (episodes): https://en.wikipedia.org/wiki/List_of_Alone_episodes#Season_1_(2015)_-_Vancouver_Island
The post Alone R package: Datasets from the survival TV series appeared first on Dan Oehm | Gradient Descending.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.