Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Abstract
In this post, I will analyse the data I scraped and put into an R package, which I called {nethack}
.
NetHack is a roguelike game; for more context, read my previous blog
post.
You can install the {nethack}
package and play around with the data yourself by installing it from github:
devtools::install_github("b-rodrigues/nethack")
And to use it:
library(nethack) data("nethack")
The data contains information on games played from 2001 to 2018; 322485 rows and 14 columns. I will analyze the data in a future blog post. This post focuses on getting and then sharing the data. By the way, all the content from the public server I scrape is under the CC BY 4.0 license.
I built the package by using the very useful {devtools}
package.
Introduction
What I want from this first analysis are several, simple things: how many players manage to ascend
(meaning, winning), what monster kills most players, and finally extract data from the dumplog
column. The dumplog
column is a bit special; each element of the dumplog column is a log file
that contains a lot of information from the last turns of a player. I will leave this for a future
blog post, though.
Let’s load some packages first:
library(nethack) library(tidyverse) library(lubridate) library(magrittr) library(brotools)
{brotools}
is my own package that contains some functions that I use daily. If you want to
install it, run the following line:
devtools::install_github("b-rodrigues/brotools")
The documentation is not up-to-date, I think I’ll do that and release it on CRAN. Some day.
Now, let’s load the “nethack” data, included in the {nethack}
package:
data("nethack") head(nethack) ## rank score name time turns lev_max hp_max role race gender alignment ## 1 1 360 jkm <NA> NA 2/2 -2/25 Sam Hum Mal Law ## 2 2 172 yosemite <NA> NA 1/1 -1/10 Tou Hum Fem Neu ## 3 3 2092 dtype <NA> NA 6/7 -2/47 Val Hum Fem Neu ## 4 4 32 joorko <NA> NA 1/1 0/15 Sam Hum Mal Law ## 5 5 118 jorko <NA> NA 1/1 0/11 Rog Orc Fem Cha ## 6 6 1757 aaronl <NA> NA 5/5 0/60 Bar Hum Mal Neu ## death date ## 1 killed by a brown mold 2001-10-24 ## 2 killed by a jackal 2001-10-24 ## 3 killed by a fire ant 2001-10-24 ## 4 killed by a jackal 2001-10-24 ## 5 killed by a jackal 2001-10-24 ## 6 killed by a hallucinogen-distorted ghoul, while helpless 2001-10-24 ## dumplog ## 1 NA ## 2 NA ## 3 NA ## 4 NA ## 5 NA ## 6 NA
Let’s create some variables that might be helpful (or perhaps not, we’ll see):
nethack %<>% mutate(date = ymd(date), year = year(date), month = month(date), day = day(date))
This makes it easy to look at the data from, say, June 2017:
nethack %>% filter(year == 2017, month == 6) %>% brotools::describe() ## # A tibble: 15 x 13 ## variable type nobs mean sd mode min max q25 median ## <chr> <chr> <int> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 day Nume… 1451 17.4 9.00e0 1 1 30 10 19 ## 2 month Nume… 1451 6 0. 6 6 6 6 6 ## 3 rank Nume… 1451 47.1 2.95e1 1 1 100 20 47 ## 4 score Nume… 1451 38156. 3.39e5 488 0 5966425 402. 953 ## 5 turns Nume… 1451 4179. 1.23e4 812 1 291829 860. 1796 ## 6 year Nume… 1451 2017 0. 2017 2017 2017 2017 2017 ## 7 alignme… Char… 1451 NA NA Law NA NA NA NA ## 8 death Char… 1451 NA NA kill… NA NA NA NA ## 9 gender Char… 1451 NA NA Mal NA NA NA NA ## 10 hp_max Char… 1451 NA NA -1/16 NA NA NA NA ## 11 lev_max Char… 1451 NA NA 4/4 NA NA NA NA ## 12 name Char… 1451 NA NA ohno… NA NA NA NA ## 13 race Char… 1451 NA NA Hum NA NA NA NA ## 14 role Char… 1451 NA NA Kni NA NA NA NA ## 15 time Char… 1451 NA NA 01:1… NA NA NA NA ## # ... with 3 more variables: q75 <dbl>, n_missing <int>, n_unique <int>
Let’s also take a look at a dumplog:
< details>< summary>Click to expand; the dumplog is quite long
nethack %>% filter(year == 2018, month == 10) %>% slice(1) %>% pull(dumplog) ## [[1]] ## [1] "Unix NetHack Version 3.6.1 - last build Fri Apr 27 19:25:48 2018. (d4ebae12f1a709d1833cf466dd0c553fb97518d2)" ## [2] "" ## [3] "Game began 2018-09-30 22:27:18, ended 2018-10-01 00:01:12." ## [4] "" ## [5] "brothertrebius, neutral female gnomish Ranger" ## [6] "" ## [7] " -----" ## [8] " -------- |....# ----- --------" ## [9] " |/..%.=| #...^|######|...| ##.......|" ## [10] " |/[%..%| #|...| #|...| # |......|" ## [11] " |......| #----- #-...-######....<..|" ## [12] " -----|-- ### -|-.- # |......|" ## [13] " ## ## # # ----f---" ## [14] " #### # # ## f@Y" ## [15] " # # # #" ## [16] " -----.-------# # #" ## [17] " |........%..|# # #" ## [18] " |............# # #" ## [19] " |...........| 0## #" ## [20] " |...........| -.--- #" ## [21] " ------------- |^..|##" ## [22] " |...|#" ## [23] " |0>..#" ## [24] " -----" ## [25] "" ## [26] "Brothertre the Trailblazer St:15 Dx:12 Co:16 In:13 Wi:15 Ch:6 Neutral" ## [27] "Dlvl:6 $:59 HP:0(54) Pw:40(40) AC:0 Exp:8 T:7398 Satiated Burdened" ## [28] "" ## [29] "Latest messages:" ## [30] " In what direction? l" ## [31] " You shoot 2 arrows." ## [32] " The 1st arrow hits the ape." ## [33] " The 2nd arrow hits the ape!" ## [34] " The ape hits!" ## [35] " The ape hits!" ## [36] " The ape bites!" ## [37] " You ready: q - 9 uncursed arrows." ## [38] " In what direction? l" ## [39] " The arrow hits the ape." ## [40] " The ape hits!" ## [41] " The ape hits!" ## [42] " The ape bites!" ## [43] " The ape hits!" ## [44] " The ape hits!" ## [45] " The ape misses!" ## [46] " In what direction? l" ## [47] " You shoot 2 arrows." ## [48] " The 1st arrow hits the ape!" ## [49] " The 2nd arrow hits the ape." ## [50] " The ape misses!" ## [51] " The ape hits!" ## [52] " The ape misses!" ## [53] " In what direction? l" ## [54] " You shoot 2 arrows." ## [55] " The 1st arrow misses the ape." ## [56] " The 2nd arrow hits the ape." ## [57] " The ape misses!" ## [58] " The ape hits!" ## [59] " The ape bites!" ## [60] " In what direction? l" ## [61] " The arrow hits the ape!" ## [62] " The ape hits!" ## [63] " The ape misses!" ## [64] " The ape bites!" ## [65] " You hear someone cursing shoplifters." ## [66] " The ape misses!" ## [67] " The ape hits!" ## [68] " The ape bites!" ## [69] " What do you want to write with? [- amnqsvBJM-OWZ or ?*] -" ## [70] " You write in the dust with your fingertip." ## [71] " What do you want to write in the dust here? Elbereth" ## [72] " The ape hits!" ## [73] " The ape hits!" ## [74] " You die..." ## [75] " Do you want your possessions identified? [ynq] (y) y" ## [76] " Do you want to see your attributes? [ynq] (y) n" ## [77] " Do you want an account of creatures vanquished? [ynaq] (y) n" ## [78] " Do you want to see your conduct? [ynq] (y) n" ## [79] " Do you want to see the dungeon overview? [ynq] (y) q" ## [80] "" ## [81] "Inventory:" ## [82] " Coins" ## [83] " $ - 59 gold pieces" ## [84] " Weapons" ## [85] " m - 17 blessed +1 arrows" ## [86] " n - a blessed +0 arrow" ## [87] " q - 3 +0 arrows (in quiver)" ## [88] " s - a +0 bow (weapon in hand)" ## [89] " B - 11 +1 darts" ## [90] " N - 11 +0 darts" ## [91] " a - a +1 dagger (alternate weapon; not wielded)" ## [92] " Armor" ## [93] " T - an uncursed +0 dwarvish iron helm (being worn)" ## [94] " z - an uncursed +0 pair of leather gloves (being worn)" ## [95] " U - a cursed -4 pair of iron shoes (being worn)" ## [96] " e - an uncursed +2 cloak of displacement (being worn)" ## [97] " h - a blessed +0 dwarvish mithril-coat (being worn)" ## [98] " Comestibles" ## [99] " f - 3 uncursed cram rations" ## [100] " j - 2 uncursed food rations" ## [101] " L - an uncursed food ration" ## [102] " P - an uncursed lembas wafer" ## [103] " I - an uncursed lizard corpse" ## [104] " o - an uncursed tin of spinach" ## [105] " Scrolls" ## [106] " G - 2 uncursed scrolls of blank paper" ## [107] " t - an uncursed scroll of confuse monster" ## [108] " V - an uncursed scroll of identify" ## [109] " Potions" ## [110] " x - an uncursed potion of gain ability" ## [111] " H - a blessed potion of sleeping" ## [112] " g - 3 uncursed potions of water" ## [113] " Rings" ## [114] " O - an uncursed ring of slow digestion (on left hand)" ## [115] " v - an uncursed ring of stealth (on right hand)" ## [116] " Tools" ## [117] " p - an uncursed magic lamp" ## [118] " k - an uncursed magic whistle" ## [119] " Q - an uncursed mirror" ## [120] " C - an uncursed saddle" ## [121] " D - an uncursed stethoscope" ## [122] " y - a +0 unicorn horn" ## [123] " i - 7 uncursed wax candles" ## [124] " Gems/Stones" ## [125] " W - an uncursed flint stone" ## [126] " M - an uncursed worthless piece of red glass" ## [127] " Z - an uncursed worthless piece of violet glass" ## [128] " J - an uncursed worthless piece of white glass" ## [129] "" ## [130] "Brothertrebius the Ranger's attributes:" ## [131] "" ## [132] "Background:" ## [133] " You were a Trailblazer, a level 8 female gnomish Ranger." ## [134] " You were neutral, on a mission for Venus" ## [135] " who was opposed by Mercury (lawful) and Mars (chaotic)." ## [136] "" ## [137] "Final Characteristics:" ## [138] " You had 0 hit points (max:54)." ## [139] " You had 40 magic power (max:40)." ## [140] " Your armor class was 0." ## [141] " You had 1552 experience points." ## [142] " You entered the dungeon 7398 turns ago." ## [143] " Your strength was 15 (limit:18/50)." ## [144] " Your dexterity was 12 (limit:18)." ## [145] " Your constitution was 16 (limit:18)." ## [146] " Your intelligence was 13 (limit:19)." ## [147] " Your wisdom was 15 (limit:18)." ## [148] " Your charisma was 6 (limit:18)." ## [149] "" ## [150] "Final Status:" ## [151] " You were satiated." ## [152] " You were burdened; movement was slightly slowed." ## [153] " You were wielding a bow." ## [154] "" ## [155] "Final Attributes:" ## [156] " You were piously aligned." ## [157] " You were telepathic." ## [158] " You had automatic searching." ## [159] " You had infravision." ## [160] " You were displaced." ## [161] " You were stealthy." ## [162] " You had slower digestion." ## [163] " You were guarded." ## [164] " You are dead." ## [165] "" ## [166] "Vanquished creatures:" ## [167] " a warhorse" ## [168] " a tengu" ## [169] " a quivering blob" ## [170] " an iron piercer" ## [171] " 2 black lights" ## [172] " a gold golem" ## [173] " a werewolf" ## [174] " 3 lizards" ## [175] " 2 dingoes" ## [176] " a housecat" ## [177] " a white unicorn" ## [178] " 2 dust vortices" ## [179] " a plains centaur" ## [180] " an ape" ## [181] " a Woodland-elf" ## [182] " 2 soldier ants" ## [183] " a bugbear" ## [184] " an imp" ## [185] " a wood nymph" ## [186] " a water nymph" ## [187] " a rock piercer" ## [188] " a pony" ## [189] " 3 fog clouds" ## [190] " a yellow light" ## [191] " a violet fungus" ## [192] " 2 gnome lords" ## [193] " 2 gnomish wizards" ## [194] " 2 gray oozes" ## [195] " 2 elf zombies" ## [196] " a straw golem" ## [197] " a paper golem" ## [198] " 2 giant ants" ## [199] " 2 little dogs" ## [200] " 3 floating eyes" ## [201] " 8 dwarves" ## [202] " a homunculus" ## [203] " 3 kobold lords" ## [204] " 3 kobold shamans" ## [205] " 13 hill orcs" ## [206] " 4 rothes" ## [207] " 2 centipedes" ## [208] " 3 giant bats" ## [209] " 6 dwarf zombies" ## [210] " a werejackal" ## [211] " 3 iguanas" ## [212] " 23 killer bees" ## [213] " an acid blob" ## [214] " a coyote" ## [215] " 3 gas spores" ## [216] " 5 hobbits" ## [217] " 7 manes" ## [218] " 2 large kobolds" ## [219] " a hobgoblin" ## [220] " 2 giant rats" ## [221] " 2 cave spiders" ## [222] " a yellow mold" ## [223] " 6 gnomes" ## [224] " 8 garter snakes" ## [225] " 2 gnome zombies" ## [226] " 8 geckos" ## [227] " 11 jackals" ## [228] " 5 foxes" ## [229] " 2 kobolds" ## [230] " 2 goblins" ## [231] " a sewer rat" ## [232] " 6 grid bugs" ## [233] " 3 lichens" ## [234] " 2 kobold zombies" ## [235] " 5 newts" ## [236] "206 creatures vanquished." ## [237] "" ## [238] "No species were genocided or became extinct." ## [239] "" ## [240] "Voluntary challenges:" ## [241] " You never genocided any monsters." ## [242] " You never polymorphed an object." ## [243] " You never changed form." ## [244] " You used no wishes." ## [245] "" ## [246] "The Dungeons of Doom: levels 1 to 6" ## [247] " Level 1:" ## [248] " A fountain." ## [249] " Level 2:" ## [250] " A sink." ## [251] " Level 3:" ## [252] " A general store, a fountain." ## [253] " Level 4:" ## [254] " A general store, a fountain." ## [255] " Stairs down to The Gnomish Mines." ## [256] " Level 5:" ## [257] " A fountain." ## [258] " Level 6: <- You were here." ## [259] " A general store." ## [260] " Final resting place for" ## [261] " you, killed by an ape." ## [262] "The Gnomish Mines: levels 5 to 8" ## [263] " Level 5:" ## [264] " Level 6:" ## [265] " Level 7:" ## [266] " Many shops, a temple, some fountains." ## [267] " Level 8:" ## [268] "" ## [269] "Game over:" ## [270] " ----------" ## [271] " / \\" ## [272] " / REST \\" ## [273] " / IN \\" ## [274] " / PEACE \\" ## [275] " / \\" ## [276] " | brothertrebius |" ## [277] " | 59 Au |" ## [278] " | killed by an ape |" ## [279] " | |" ## [280] " | |" ## [281] " | |" ## [282] " | 2018 |" ## [283] " *| * * * | *" ## [284] " _________)/\\\\_//(\\/(/\\)/\\//\\/|_)_______" ## [285] "" ## [286] "Goodbye brothertrebius the Ranger..." ## [287] "" ## [288] "You died in The Dungeons of Doom on dungeon level 6 with 6652 points," ## [289] "and 59 pieces of gold, after 7398 moves." ## [290] "You were level 8 with a maximum of 54 hit points when you died." ## [291] ""
Now, I am curious to see how many games are played per day:
runs_per_day <- nethack %>% group_by(date) %>% count() %>% ungroup() ggplot(runs_per_day, aes(y = n, x = date)) + geom_point(colour = "#0f4150") + geom_smooth(colour = "#82518c") + theme_blog() ## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
The number of games seems to be stable since 2015, around 50. But what is also interesting is not only the number of games played, but also how many of these games resulted in a win.
For this, let’s also add a new column that tells us whether the played ascended (won the game) or not:
nethack %<>% mutate(Ascended = ifelse(death == "ascended", "Ascended", "Died an horrible death"))
I’m curious to see how many players managed to ascend… NetHack being as hard as diamonds, probably not a lot:
ascensions_per_day <- nethack %>% group_by(date, Ascended) %>% count() %>% rename(Total = n) ggplot(ascensions_per_day) + geom_area(aes(y = Total, x = as.Date(date), fill = Ascended)) + theme_blog() + labs(y = "Number of runs", x = "Date") + scale_fill_blog() + theme(legend.title = element_blank())
Yeah, just as expected. Because there is so much data, it’s difficult to see clearly, though. Depending on the size of the screen you’re reading this, it might seem that in some days there are a lot of ascensions. This is only an impression due to the resolution of the picture. Let’s see the share of ascensions per year (and how many times the quests fail miserably), and this will become more apparent:
ascensions_per_day %>% mutate(Year = year(as.Date(date))) %>% group_by(Year, Ascended) %>% summarise(Total = sum(Total, na.rm = TRUE)) %>% group_by(Year) %>% mutate(denom = sum(Total, na.rm = TRUE)) %>% ungroup() %>% mutate(Share = Total/denom) %>% ggplot() + geom_col(aes(y = Share, x = Year, fill = Ascended)) + theme_blog() + scale_fill_blog() + theme(legend.title = element_blank())
I will now convert the “time” column to seconds. I am not yet sure that this column is really useful, because NetHack is a turn based game. This means that when the player does not move, neither do the monsters. So the seconds spent playing might not be a good proxy for actual time spent playing. But it makes for a good exercise:
convert_to_seconds <- function(time_string){ time_numeric <- time_string %>% str_split(":", simplify = TRUE) %>% as.numeric time_in_seconds <- sum(time_numeric * c(3600, 60, 1)) time_in_seconds }
The strings I want to convert are of the form “01:34:43”, so I split at the “:” and then convert
the result to numeric. I end up with an atomic vector (c(1, 34, 43)
). Then I multiple each element
by the right number of seconds, and sum that to get the total. Let’s apply it to my data:
nethack %<>% mutate(time_in_seconds = map_dbl(time, convert_to_seconds))
What is the distribution of “time_in_seconds”?
nethack %>% describe(time_in_seconds) ## # A tibble: 1 x 13 ## variable type nobs mean sd mode min max q25 median q75 ## <chr> <chr> <int> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 time_in… Nume… 322485 23529. 2.73e5 <NA> 61 2.72e7 622 1689 5486 ## # ... with 2 more variables: n_missing <int>, n_unique <lgl>
We see that the minimum of time_in_seconds
is 61 whereas the maximum is of the order of 27200000…
This must be a mistake, because that is almost one year!
nethack %>% filter(time_in_seconds == max(time_in_seconds, na.rm = TRUE)) ## rank score name time turns lev_max hp_max role race ## 1 28 3173960108 fisted 7553:41:49 6860357 4/47 362/362 Wiz Elf ## gender alignment death date dumplog year ## 1 Mal Neu drowned in a pool of water 2017-02-02 NA 2017 ## month day Ascended time_in_seconds ## 1 2 2 Died an horrible death 27193309
Well… maybe “fisted” wanted to break the record of the longest NetHack game ever. Congratulations!
Let’s take a look at the density but cut it at 90th percentile:
nethack %>% filter(!is.na(time_in_seconds), time_in_seconds < quantile(time_in_seconds, 0.9, na.rm = TRUE)) %>% ggplot() + geom_density(aes(x = time_in_seconds), colour = "#82518c") + theme_blog()
As expected, the distribution is right skewed. However, as explained above NetHack is a turn based
game, meaning that if the player does not move, the monsters won’t move either. Perhaps it makes more
sense to look at the turns
column:
nethack %>% describe(turns) ## # A tibble: 1 x 13 ## variable type nobs mean sd mode min max q25 median q75 ## <chr> <chr> <int> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 turns Nume… 322485 4495. 19853. <NA> 1 6.86e6 871 1818 3582 ## # ... with 2 more variables: n_missing <int>, n_unique <lgl>
The maximum is quite large too. Just like before, let’s focus by cutting the variable at the 90th percentile:
nethack %>% filter(!is.na(turns), turns < quantile(turns, 0.9, na.rm = TRUE)) %>% ggplot() + geom_density(aes(x = turns), colour = "#82518c") + theme_blog()
I think that using turns
makes more sense. In the a future blog post, I will estimate a survival
model and see how long players survive, and will use turns
instead of time_in_seconds
.
Analysis
What kills the players
To know what kills players so much, some cleaning of the death
column is in order. Death can
occur from poisoning, starvation, accidents, drowning… of course monsters can kill the player too.
Here are some values of the death
variable:
burned by a tower of flame choked on a lichen corpse died of starvation fell into a pit of iron spikes killed by a gnome killed by a gnome called Blabla killed by a gnome called Blabla while sleeping slipped while mounting a saddled pony slipped while mounting a saddled pony called Jolly Jumper zapped her/himself with a spell
To know what is the most frequent cause of death, I have to do some cleaning, because if not, “killed by a gnome” and “killed by a gnome called Blabla” would be two different causes of death. In the end, what interests me is to know how many times the player got killed by a gnome.
The following lines do a cleanup of the death
variable:
nethack %<>% mutate(death2 = case_when(str_detect(death, "poisoned") ~ "poisoned", str_detect(death, "slipped") ~ "accident", str_detect(death, "petrified") ~ "petrified", str_detect(death, "choked") ~ "accident", str_detect(death, "caught.*self") ~ "accident", str_detect(death, "starvation") ~ "starvation", str_detect(death, "drowned") ~ "drowned", str_detect(death, "fell") ~ "fell", str_detect(death, "zapped") ~ "zapped", str_detect(death, "killed") ~ "killed", TRUE ~ death)) %>% mutate(death3 = str_extract(death, "(?<=by|while).*")) %>% mutate(death3 = case_when(str_detect(death3, ",|\\bcalled\\b") ~ str_extract(death3, "(.*?),|(.*?)\\bcalled\\b"), TRUE ~ death3)) %>% mutate(death3 = str_remove(death3, ",|called|\\ban?"), death3 = str_trim(death3))
death2
is a new variable, in which I broadly categorize causes of death. Using regular expressions
I detect causes of death and aggregate some categories, for instance “slipped” and “chocked” into
“accident”. Then, I want to extract everything that comes after the strings “by” or while, and put
the result into a new variable called death3
. Then I detect the string “,” or “called”; if one
of these strings is present, I extract everything that comes before “,” or that comes before
“called”. Finally, I remove “,”, “called” or “a” or “an” from the string and trim the whitespaces.
Let’s take a look at these new variables:
set.seed(123) nethack %>% select(name, death, death2, death3) %>% sample_n(10) ## name death death2 death3 ## 92740 DianaFury killed by a death ray killed death ray ## 254216 Oddabit killed by a tiger killed tiger ## 131889 shachaf killed by a fire ant killed fire ant ## 284758 a43 poisoned by a killer bee poisoned killer bee ## 303283 goast killed by a gecko killed gecko ## 14692 liberty killed by a gnome king killed gnome king ## 170303 arch18 ascended ascended <NA> ## 287786 foolishwtf killed by a bat killed bat ## 177826 Renleve killed by a giant bat killed giant bat ## 147248 TheOV killed by a black unicorn killed black unicorn
Now, it is quite easy to know what monsters are the meanest buttholes; let’s focus on the top 15. Most likely, these are going to be early game monsters. Let’ see:
nethack %>% filter(!is.na(death3)) %>% count(death3) %>% top_n(15) %>% mutate(death3 = fct_reorder(death3, n, .desc = FALSE)) %>% ggplot() + geom_col(aes(y = n, x = death3)) + coord_flip() + theme_blog() + scale_fill_blog() + ylab("Number of deaths caused") + xlab("Monster") ## Selecting by n
Seems like soldier ants are the baddest, followed by jackals and dwarfs. As expected, these are
mostly early game monsters. Thus, it would be interesting to look at this distribution, but at
different stages in the game. Let’s create a categorical variable that discretizes turns
,
and then create one plot per category:
< summary>Click to expand
nethack %>% filter(!is.na(death3)) %>% filter(!is.na(turns)) %>% mutate(turn_flag = case_when(between(turns, 1, 5000) ~ "Less than 5000", between(turns, 5001, 10000) ~ "Between 5001 and 10000", between(turns, 10001, 20000) ~ "Between 10001 and 20000", between(turns, 20001, 40000) ~ "Between 20001 and 40000", between(turns, 40001, 60000) ~ "Between 40001 and 60000", turns > 60000 ~ "More than 60000")) %>% mutate(turn_flag = factor(turn_flag, levels = c("Less than 5000", "Between 5001 and 10000", "Between 10001 and 20000", "Between 20001 and 40000", "Between 40001 and 60000", "More than 60000"), ordered = TRUE)) %>% group_by(turn_flag) %>% count(death3) %>% top_n(15) %>% nest() %>% mutate(data = map(data, ~mutate(., death3 = fct_reorder(death3, n, .desc = TRUE)))) %>% mutate(plots = map2(.x = turn_flag, .y = data, ~ggplot(data = .y) + geom_col(aes(y = n, x = death3)) + coord_flip() + theme_blog() + scale_fill_blog() + ylab("Number of deaths caused") + xlab("Monster") + ggtitle(.x))) %>% pull(plots) ## Selecting by n ## [[1]]
## ## [[2]]
## ## [[3]]
## ## [[4]]
## ## [[5]]
## ## [[6]]
Finally, for this section, I want to know if there are levels, or floors, where players die more
often than others. For this, we can take a look at the lev_max
column. Observations in this
column are of the form “8/10”. This means that the player died on level 8, but the lowest level
that was explored was the 10th. Let’s do this for the year 2017 first. Before anything, I have
to explain the layout of the levels of the game. You can see a diagram
here. The player starts on floor 1,
and goes down to level 53. Then, the player can ascend, by going on levels -1 to -5. But there
are more levels than these ones. -6 and -9 are the sky, and the player can teleport there (but will
fall to his death). If the player teleports to level -10, he’ll enter heaven (and die too). Because
these levels are special, I do not consider them here. I do not consider level 0 either, which is
“Nowhere”. Let’s get the number of players who died on each floor, but also compute the cumulative
death count:
died_on_level <- nethack %>% filter(Ascended == "Died an horrible death") %>% mutate(died_on = str_extract(lev_max, "-?\\d{1,}")) %>% mutate(died_on = as.numeric(died_on)) %>% group_by(year) %>% count(died_on) %>% filter(died_on >= -5, died_on != 0) %>% mutate(died_on = case_when(died_on == -1 ~ 54, died_on == -2 ~ 55, died_on == -3 ~ 56, died_on == -4 ~ 57, died_on == -5 ~ 58, TRUE ~ died_on)) %>% arrange(desc(died_on)) %>% mutate(cumul_deaths = cumsum(n))
Let’s take a look:
head(died_on_level) ## # A tibble: 6 x 4 ## # Groups: year [6] ## year died_on n cumul_deaths ## <dbl> <dbl> <int> <int> ## 1 2002 58 5 5 ## 2 2003 58 11 11 ## 3 2004 58 19 19 ## 4 2005 58 28 28 ## 5 2006 58 25 25 ## 6 2007 58 22 22
Now, let’s compute the number of players who ascended and add this to the cumulative count:
ascended_yearly <- nethack %>% filter(Ascended == "Ascended") %>% group_by(year) %>% count(Ascended)
Let’s take a look:
head(ascended_yearly) ## # A tibble: 6 x 3 ## # Groups: year [6] ## year Ascended n ## <dbl> <chr> <int> ## 1 2001 Ascended 4 ## 2 2002 Ascended 38 ## 3 2003 Ascended 132 ## 4 2004 Ascended 343 ## 5 2005 Ascended 329 ## 6 2006 Ascended 459
I will modify the dataset a little bit and merge it with the previous one:
ascended_yearly %<>% rename(ascended_players = n) %>% select(-Ascended)
Let’s add this to the data frame from before by merging both, and then we can compute the surviving players:
died_on_level %<>% full_join(ascended_yearly, by = "year") %>% mutate(surviving_players = cumul_deaths + ascended_players)
Now we can compute the share of players who died on each level:
died_on_level %>% mutate(death_rate = n/surviving_players) %>% ggplot(aes(y = death_rate, x = as.factor(died_on))) + geom_line(aes(group = year, alpha = year), colour = "#82518c") + theme_blog() + ylab("Death rate") + xlab("Level") + theme(axis.text.x = element_text(angle = 90), legend.position = "none") + scale_y_continuous(labels = scales::percent)
Looks like level 7 is consistently the most dangerous! The death rate there is more than 35%!
That’s it for this blog post, in the next one, I will focus on what players kill!
If you found this blog post useful, you might want to follow me on twitter for blog post updates.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.