My R Take on Advent of Code – Day 4
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
After some wonderful Christmas and New Year’s distractions, now it’s time to continue with my Advent of Code challenges in R (before the summer comes…).
To avoid waffling, the 4th puzzle offers a record of guards’ shifts with various activities plus the time they started and time. We need to gather two things from this dataset:
- Which guard sleeps most (minutes) and
- What minute does that guard spend asleep the most?
Then, we have to multiply the guard number by the most common minute he/she falls asleep to get the final solution. Let’s get down to work, then!
First, let’s have a look at the data:
library(tidyverse) raw_input <- read.delim('day4-raw-input.txt', header = F) head(raw_input) ## V1 ## 1 [1518-09-28 00:56] wakes up ## 2 [1518-10-15 00:05] falls asleep ## 3 [1518-02-15 00:58] wakes up ## 4 [1518-08-26 00:51] wakes up ## 5 [1518-03-23 00:32] wakes up ## 6 [1518-05-04 23:56] Guard #523 begins shift
This data needs some serious cleaning! Let’s brush up on our regex knowledge a bit and seperate timestamps from guard activities:
# clean the input clean_input <- raw_input %>% rename(value = V1) %>% mutate(timestamp = lubridate::ymd_hm(str_extract(value, '[:digit:]+-[:digit:]+-[:digit:]+..[:digit:]+:[:digit:]+')), action = str_extract(value, '[:alpha:]+..[:alpha:]+'), guard_num = ifelse(str_detect(value, '#'), str_extract(value, '#[:digit:]+'), NA), date = lubridate::date(timestamp), minute = lubridate::minute(timestamp) # minute that the activity started ) %>% arrange(timestamp) %>% # sort in chronological order fill(guard_num) # fill in missing guard numbers head(clean_input) ## value timestamp ## 1 [1518-01-28 00:00] Guard #151 begins shift 1518-01-28 00:00:00 ## 2 [1518-01-28 00:40] falls asleep 1518-01-28 00:40:00 ## 3 [1518-01-28 00:49] wakes up 1518-01-28 00:49:00 ## 4 [1518-01-28 00:57] falls asleep 1518-01-28 00:57:00 ## 5 [1518-01-28 00:58] wakes up 1518-01-28 00:58:00 ## 6 [1518-01-29 00:04] Guard #2017 begins shift 1518-01-29 00:04:00 ## action guard_num date minute ## 1 Guard #151 1518-01-28 0 ## 2 falls asleep #151 1518-01-28 40 ## 3 wakes up #151 1518-01-28 49 ## 4 falls asleep #151 1518-01-28 57 ## 5 wakes up #151 1518-01-28 58 ## 6 Guard #2017 1518-01-29 4
Now that we have a clean dataset, we can determine who sleeps most:
# who sleeps most clean_input %>% filter(action != 'Guard') %>% # we don't need this anymore group_by(guard_num, date) %>% #calculate time asleep mutate(time_asleep = ifelse(action == 'wakes up', minute - lag(minute), NA ) ) %>% group_by(guard_num) %>% na.omit() %>% summarise(total_asleep = sum(time_asleep)) %>% # sum it arrange(desc(total_asleep)) %>% # sort it slice(1) # pick the guard that sleeps most ## # A tibble: 1 x 2 ## guard_num total_asleep ## <chr> <int> ## 1 #409 544
There you go, shame on guard number #409! Now, how can we see what is the most common time for him (her?) to fall asleep? This will require some data re-arranging:
# what's the most common time to sleep for guard #409 guard_data <- clean_input %>% filter(action != 'Guard') %>% # we don;t need it anymore! filter(guard_num == '#409') %>% # pick the winner arrange(timestamp) %>% spread(action, minute) %>% # prep the data for sequences rename(falls_asleep = `falls asleep`, wakes_up = `wakes up`) %>% mutate(falls_asleep = ifelse(!is.na(wakes_up), lag(falls_asleep), falls_asleep )) %>% na.omit() head(guard_data) ## value timestamp guard_num date ## 2 [1518-02-25 00:40] wakes up 1518-02-25 00:40:00 #409 1518-02-25 ## 4 [1518-04-02 00:24] wakes up 1518-04-02 00:24:00 #409 1518-04-02 ## 6 [1518-04-02 00:52] wakes up 1518-04-02 00:52:00 #409 1518-04-02 ## 8 [1518-04-12 00:59] wakes up 1518-04-12 00:59:00 #409 1518-04-12 ## 10 [1518-04-15 00:14] wakes up 1518-04-15 00:14:00 #409 1518-04-15 ## 12 [1518-04-15 00:50] wakes up 1518-04-15 00:50:00 #409 1518-04-15 ## falls_asleep wakes_up ## 2 3 40 ## 4 1 24 ## 6 49 52 ## 8 57 59 ## 10 4 14 ## 12 25 50
My idea is to create sequences of minutes between the minute the guard falls asleep and wakes up using seq()
. Check out this simple example:
seq(3, 40) ## [1] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ## [24] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Easy - peasy! We can apply it to the our data using map2()
:
# apply the funtion to the #409 guard's data map2(guard_data$falls_asleep, guard_data$wakes_up, seq) %>% unlist() %>% # turn a list into a vector table() %>% # get a frequency table sort() # sort it in ascending order. There are three potential answers! the middle one is correct (?!?!?!) ## . ## 1 2 59 3 58 4 5 6 15 56 57 7 8 9 10 11 12 13 14 16 17 18 19 20 21 ## 1 3 3 5 5 6 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 8 8 8 ## 22 23 24 25 26 55 27 47 28 29 33 34 46 54 30 31 32 35 45 48 43 44 49 53 36 ## 8 8 8 8 9 9 10 10 11 11 11 11 11 11 12 12 12 12 12 12 13 13 13 13 14 ## 37 38 39 40 41 42 50 51 52 ## 14 14 14 14 14 14 15 15 15
Now, that’s interesting. The top number is minute and the bottom number is the number of times the guard slept during this time. So, when you look at the last (most common) minutes in the vector you’ll notice that there are THREE (not one!) different times of the same highest frequency and they are minute 50, 51 and 52. I’m not sure if this means there’s a flaw in my solution, but after trying all three numbers it’s clear that the middle one (51) is correct:
## final solution ## guard number multiplied by the most "commonly slept"" minute number 409*51 ## [1] 20859
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.