Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Advent of Code is a series of programming puzzles you can tackle to hone your coding skills each day in the run-up to Christmas.
This year I am attempting it using R, which can make some challenges easier or harder depending on whether they are more ‘computer sciencey’ or more ‘data sciencey’. Generally it makes parsing datasets easier but low-level string manipulation more fiddly.
Here are my solutions so far. Where possible, I’ve tried to strike a
balance between efficiency and readability, and to try avoid using the
packages I might usually use (e.g. dplyr
) if I think it makes the
puzzle too easy.
The input data are different for each participant, so your numerical results may differ from mine.
- Report repair
- Password philosophy
- Toboggan trajectory
- Passport processing
- Binary boarding
- Custom customs
- Handy haversacks
- Handheld halting
- Encoding error
- Adapter array
- Seating system
- Rain risk
- Shuttle search
- Docking data
- Rambunctious recitation
Day 1 – Report repair
Two numbers
Find the two entries that sum to 2020, then multiply those two numbers together.
This can be a one-liner:
input <- as.integer(readLines('input01.txt')) prod(input[(2020 - input) %in% input]) [1] 468051
Three numbers
Find the three entries that sum to 2020, then multiply them together.
It might be tempting to go for a naïve solution like this:
prod(combn(input, 3)[, combn(input, 3, sum) == 2020]) [1] 272611658
It gives the right answer but involves a fair amount of unnecessary computation. It takes more than a second to run. If we assume all the inputs are non-negative, we can take advantage of this to reduce the number of operations.
. <- expand.grid(input, input[(2020 - input) > min(input)]) . <- transform(., Var3 = 2020 - Var1 - Var2) . <- subset(., Var3 > min(input)) prod(.[which.max(.$Var3 %in% input), ]) [1] 272611658
This is approximately 2000 times faster than the one-liner, and works by
successively discarding values that could only add up to more than 2020.
The .
notation is just so I can write this without using
dplyr
/magrittr
.
Day 2 – Password philosophy
Number of letters
How many passwords are valid according to the policies?
1-3 a: abcde 1-3 b: cdefg 2-9 c: ccccccccc
First read in the data. I like data frames and so should you.
input <- readLines('input02.txt') passwords <- do.call(rbind, strsplit(input, '[- ]|\\: ')) passwords <- setNames(as.data.frame(passwords), c('min', 'max', 'letter', 'password')) passwords <- transform(passwords, min = as.integer(min), max = as.integer(max)) head(passwords) min max letter password 1 14 15 v vdvvvvvsvvvvvfpv 2 3 11 k kkqkkfkkvkgfknkx 3 6 10 j jjjjjjjjjj 4 5 10 s nskdmzwrmpmhsrzts 5 13 15 v vvvvvvkvvvvjzvv 6 11 13 h hhhhhbhhhhdhhh
String operations are a bit of a pain in base R so it’s easier just to
use a package, like stringi
or stringr
for this.
with(passwords, { n <- stringr::str_count(password, letter) sum(n >= min & n <= max) }) [1] 625
You could also split each password with strsplit
and count the letters
with an sapply
-type loop.
Position of letters
Now the two digits describe two indices in the password, exactly one of which must match the given letter.
with(passwords, sum(xor(substr(password, min, min) == letter, substr(password, max, max) == letter)) ) [1] 391
Initially I got caught out here, by misreading the question as ‘at least
one’ and then wondering why an inclusive or (|
) was returning the
incorrect answer.
Day 3 – Toboggan trajectory
The input looks a bit like this:
..##.........##.........##.........##.........##.........##....... ---> #...#...#..#...#...#..#...#...#..#...#...#..#...#...#..#...#...#.. .#....#..#..#....#..#..#....#..#..#....#..#..#....#..#..#....#..#. ..#.#...#.#..#.#...#.#..#.#...#.#..#.#...#.#..#.#...#.#..#.#...#.# .#...##..#..#...##..#..#...##..#..#...##..#..#...##..#..#...##..#. ..#.##.......#.##.......#.##.......#.##.......#.##.......#.##..... ---> .#.#.#....#.#.#.#....#.#.#.#....#.#.#.#....#.#.#.#....#.#.#.#....# .#........#.#........#.#........#.#........#.#........#.#........# #.##...#...#.##...#...#.##...#...#.##...#...#.##...#...#.##...#... #...##....##...##....##...##....##...##....##...##....##...##....# .#..#...#.#.#..#...#.#.#..#...#.#.#..#...#.#.#..#...#.#.#..#...#.# --->
Encountering trees
Starting at the top-left corner of your map and following a slope of right 3 and down 1, how many trees would you encounter?
input <- readLines('input03.txt')
A complicated-sounding problem but the solution is mainly mathematical.
positions <- (3 * (seq_along(input) - 1)) %% nchar(input) + 1 sum(substr(input, positions, positions) == '#') [1] 268
The sequence of positions goes 1, 4, 7, …, and when it reaches the edge of the map, loops back round to the beginning. Using the modulo operator we can use the sequence modulo the width of the input map, then add one because R indexes from one rather than from zero.
Different slopes
Simply wrap the above into a function.
trees <- function(right, down = 1) { vertical <- seq(0, length(input) - 1, by = down) + 1 horizontal <- (right * (seq_along(input) - 1)) %% nchar(input) + 1 horizontal <- head(horizontal, length(vertical)) as.double( sum(substr(input[vertical], horizontal, horizontal) == '#') ) } trees(1) * trees(3) * trees(5) * trees(7) * trees(1, 2) [1] 3093068400
The as.double
bit is necessary only because multiplying large integer
outputs together can cause an overflow when the product is larger than
109.
Day 4 – Passport processing
The example input is in this ragged format, where keys and values are separated by colons and records are separated by double newlines. The first step is to parse this unusual data format.
ecl:gry pid:860033327 eyr:2020 hcl:#fffffd byr:1937 iyr:2017 cid:147 hgt:183cm iyr:2013 ecl:amb cid:350 eyr:2023 pid:028048884 hcl:#cfa07d byr:1929 hcl:#ae17e1 iyr:2013 eyr:2024 ecl:brn pid:760753108 byr:1931 hgt:179cm hcl:#cfa07d eyr:2025 pid:166559648 iyr:2011 ecl:brn hgt:59in input <- strsplit(readLines('input04.txt'), ' ') ids = cumsum(!lengths(input)) pairs <- lapply(strsplit(unlist(input), ':'), setNames, c('key', 'value')) passports <- data.frame(id = rep(ids, lengths(input)), do.call(rbind, pairs))
Missing fields
Now the data are in a standard format, this is a simple
split-apply-combine operation. I am using the base aggregate
but this
could be done equally well using dplyr
or data.table
.
required <- c('byr', 'iyr', 'eyr', 'hgt', 'hcl', 'ecl', 'pid') valid <- aggregate(key ~ id, passports, function(x) !length(setdiff(required, x))) head(valid, 10) id key 1 0 TRUE 2 1 TRUE 3 2 TRUE 4 3 TRUE 5 4 TRUE 6 5 TRUE 7 6 TRUE 8 7 FALSE 9 8 FALSE 10 9 TRUE
Then the answer is simply
sum(valid$key) [1] 190
Field validation
Thanks to the way we imported the data, this is quite straightforward. The rules are:
byr
(Birth Year) – four digits; at least 1920 and at most 2002.iyr
(Issue Year) – four digits; at least 2010 and at most 2020.eyr
(Expiration Year) – four digits; at least 2020 and at most 2030.hgt
(Height) – a number followed by either cm or in:- If
cm
, the number must be at least 150 and at most 193. - If
in
, the number must be at least 59 and at most 76.
- If
hcl
(Hair Color) – a # followed by exactly six characters0-9
ora-f
.ecl
(Eye Color) – exactly one of:amb
blu
brn
gry
grn
hzl
oth
.pid
(Passport ID) – a nine-digit number, including leading zeroes.cid
(Country ID) – ignored, missing or not.
The data are all different types (integer, double and categorical) so the first step will be to spread the table to a wider format, with one row per passport, and one column for each field.
Here is a dplyr
+ tidyr
solution.
library(dplyr) library(tidyr) passports_wide <- passports %>% pivot_wider(names_from = key, values_from = value) %>% mutate(byr = as.integer(byr), iyr = as.integer(iyr), eyr = as.integer(eyr), hgt_value = as.numeric(gsub('cm|in$', '', hgt)), hgt_unit = gsub('\\d*', '', hgt)) head(passports_wide) # A tibble: 6 x 11 id iyr cid pid eyr hcl ecl byr hgt hgt_value hgt_unit <int> <int> <chr> <chr> <int> <chr> <chr> <int> <chr> <dbl> <chr> 1 0 1928 150 4761132~ 2039 a5ac0f #25f8~ 2027 190 190 "" 2 1 2013 169 9200769~ 2026 #fffffd hzl 1929 168cm 168 "cm" 3 2 2011 <NA> 3284128~ 2023 #6b5442 brn 1948 156cm 156 "cm" 4 3 2019 279 6749079~ 2020 #602927 amb 1950 189cm 189 "cm" 5 4 2015 <NA> 4736300~ 2022 #341e13 hzl 1976 178cm 178 "cm" 6 5 2020 <NA> 6281139~ 2023 #866857 blu 1984 163cm 163 "cm"
From here, we can filter out the invalid entries, using filter
or
subset
.
passports_wide %>% filter(byr >= 1920, byr <= 2002, iyr >= 2010, iyr <= 2020, eyr >= 2020, eyr <= 2030, hgt_value >= 150 & hgt_value <= 193 & hgt_unit == 'cm' | hgt_value >= 59 & hgt_value <= 76 & hgt_unit == 'in', grepl('^#[0-9a-f]{6}$', hcl), ecl %in% c('amb', 'blu', 'brn', 'gry', 'grn', 'hzl', 'oth'), grepl('^\\d{9}$', pid)) -> valid_passports nrow(valid_passports) [1] 121
You could also use a filtering join, though since most of the fields are
ranges of integer values, you would want to use a data.table
non-equi-join rather than a simple semi_join
.
Day 5 – Binary boarding
Highest seat ID
This task is easy, as soon as you recognise that it is just converting
numbers from binary to decimal, where F
and L
denote ones and B
and R
are zeros. The distinction between rows and columns is a red
herring, because you can parse the whole sequence at once.
input <- readLines('input05.txt') binary <- lapply(strsplit(input, ''), grepl, pattern = '[BR]') seat_ids <- sapply(binary, function(x) sum(x * 2^(rev(seq_along(x)) - 1))) max(seat_ids) [1] 874
Finding an empty seat
Get the missing value, which isn’t the minimum or the maximum in the list.
setdiff(seq(min(seat_ids), max(seat_ids)), seat_ids) [1] 594
Day 6 – Custom customs
Questions with any ‘yes’
Count the number of unique letters in each group, where a ‘group’ is series of strings separated from others by a blank line. This is a union set operation.
input <- readLines('input06.txt') group <- cumsum(!nchar(input)) library(dplyr) responses <- data.frame(group = group[nchar(input) > 0], questions = input[nchar(input) > 0]) union <- aggregate(questions ~ group, responses, function(x) length(unique(unlist(strsplit(x, ''))))) sum(union$questions) [1] 6551
Questions with all ‘yes’
Similar, but now an intersection set operation.
intersection <- aggregate(questions ~ group, responses, function(x) length(Reduce(intersect, strsplit(x, '')))) sum(intersection$questions) [1] 3358
The solution to the first part could have used Reduce(union, ...)
,
which would achieve the same result as unique(unlist(...))
.
Both of these could be made a bit more readable using dplyr
or
data.table
instead. In particular, the base function aggregate
doesn’t like list-columns as inputs, so the strsplit
can’t be done
before the aggregation. This is not a problem with dplyr::summarise
or
data.table
:
library(dplyr) responses %>% mutate(questions = strsplit(questions, '')) %>% group_by(group) %>% summarise(count = Reduce(intersect, questions) %>% length) %>% pull(count) %>% sum [1] 3358 library(data.table) setDT(responses)[, questions := strsplit(questions, '')] responses[, .(count = length(Reduce(intersect, questions))), by = group][, sum(count)] [1] 3358
Day 7 – Handy haversacks
Number of bag colours
Given an input list of rules, how many different colours of bags may
contain at least one shiny gold
bag?
The first step will be to parse the natural language input, which looks like this:
input <- readLines('input07.txt') head(input) [1] "mirrored silver bags contain 4 wavy gray bags." [2] "clear tan bags contain 5 bright purple bags, 1 pale black bag, 5 muted lime bags." [3] "dim crimson bags contain 5 vibrant salmon bags, 2 clear cyan bags, 2 striped lime bags, 5 vibrant violet bags." [4] "mirrored beige bags contain 4 pale gold bags, 1 pale aqua bag." [5] "pale maroon bags contain 2 dotted orange bags." [6] "dim tan bags contain no other bags."
For this first exercise, the numbers of bags within each one are irrelevant (but we will need them later for part 2). For now, we just want to reduce it to which colours can contain which others.
To start, I tidied up the data into a flat data frame. This isn’t strictly necessary—a named list would work too—but it’s easy to keep track of everything in a flat data structure.
library(tidyr) rules <- strsplit(input, ' contain ') %>% lapply(gsub, pattern = '\\.| bags?', replacement = '') %>% do.call(rbind, .) %>% as.data.frame %>% setNames(c('container', 'content')) %>% transform(content = strsplit(content, ', ')) %>% unnest_longer(content) %>% extract(content, c('number', 'content'), '(\\d+) (.+)') %>% transform(number = as.numeric(number)) %>% transform(number = replace(number, is.na(number), 0)) head(rules) container number content 1 mirrored silver 4 wavy gray 2 clear tan 5 bright purple 3 clear tan 1 pale black 4 clear tan 5 muted lime 5 dim crimson 5 vibrant salmon 6 dim crimson 2 clear cyan
The algorithm is a queue, which works as follows.
- Look up which bags can directly contain
shiny gold
- Look up which bags can directly contain the results of 1.
- Repeat until no more bags can contain the result.
Here’s the loop:
bag <- 'shiny gold' containers <- NULL repeat { contained_in <- subset(rules, content %in% bag) if ( !nrow(contained_in) ) break bag <- setdiff(contained_in$container, containers) containers <- union(containers, bag) } length(containers) [1] 259
Number of individual bags
We ignored the numbers of bags in part 1, but we need them, now. How
many individual bags fit inside a single shiny gold
bag?
To understand recursion, you must first understand recursion. My
function, count_bag
, calls itself. In more loop-friendly languages you
might use a queue for this second part, but I can’t really think of a
concise way to do it using R.
count_bag <- function(colour, factor = 1) { stopifnot(length(colour) == 1) rule <- subset(rules, container == colour) if (nrow(rule) == 1 & rule$number[1] == 0) { out <- 0 } else { # need to work row-wise or you'll come unstuck: out <- mapply(count_bag, rule$content, rule$number) } factor * (1 + sum(out)) }
We remove 1 at the end so as not to include the shiny gold
bag itself:
count_bag('shiny gold', 1) - 1 [1] 45018
Day 8 – Handheld halting
Infinite loop
Just a simple loop that keeps track of all the places it has been so far, and terminates the moment it visits a location for the second time.
input <- read.table('input08.txt', col.names = c('instr', 'value')) acc <- input$visited <- 0 i <- 1 repeat { input$visited[i] <- input$visited[i] + 1 if ( any(input$visited > 1) ) break acc <- acc + input$value[i] * (input$instr[i] == 'acc') i <- i + (input$instr[i] == 'jmp') * (input$value[i] - 1) + 1 } acc [1] 1600
Originally I wrote this with nested if
statements, then changed it to
binary multiplication, for fewer lines of code, at the expense of
readability.
This puzzle is set up to catch you out. From seeing nop +0
in the
example data you might be tempted to assume that adding the value on
nop
instructions won’t affect the accumulator. But the test input data
have some non-zero nop
values thrown in, that you will surely
encounter:
head(subset(input, instr == 'nop' & value != 0 & visited > 0)) instr value visited 2 nop 631 1 11 nop 83 2 71 nop 168 1 73 nop 151 1 96 nop -25 1 123 nop -9 1
Thus you must only jump or add to the accumulator on instructions that
are explicitly jmp
or acc
, respectively.
Corrupted code
From part 1, we already have an algorithm for finding the first instruction that will lead into an infinite loop. Instead of terminating at this point, we can assume that last instruction was corrupted, swap it for the other type, then continue until we find another such corruption, all the way until the program is able to terminate on its own.
That was the idea, anyway. Then I got fed up and decided to brute force it, instead. Maybe there is a subtler way, but this appears to work quickly enough. One thing worth noting is that you only need to look at those indices already visited in part 1.
nops_and_jmps <- which(input$instr != 'acc' & input$visited) brute_force <- function() { for (nj in nops_and_jmps) { modified <- input modified$instr[nj] <- setdiff(c('nop', 'jmp'), input$instr[nj]) acc <- modified$visited <- 0 i <- 1 repeat { if (i == nrow(input) + 1) return(acc) modified$visited[i] <- modified$visited[i] + 1 if ( any(modified$visited > 1) ) break acc <- acc + modified$value[i] * (modified$instr[i] == 'acc') i <- i + (modified$instr[i] == 'jmp') * (modified$value[i] - 1) + 1 } } } brute_force() [1] 1543
Day 9 – Encoding error
Adding pairs
Here the question is how to calculate the sums of pairs of values in a sliding window, ideally without redundantly computing the same sums more than once.
find_error <- function(series, N = 25) { preamble <- head(series, N) t(combn(preamble, 2)) ->.; addmargins(., 2) ->.; as.data.frame(.) -> pairs for (x in tail(series, -N)) { if (!x %in% pairs$Sum) return(x) pairs <- subset(pairs, V1 != preamble[1] & V2 != preamble[1]) pairs <- rbind(pairs, data.frame(V1 = x, V2 = preamble[-1], Sum = x + preamble[-1])) preamble <- c(preamble[-1], x) } }
John Mount recently pointed
out that there is already
a ‘pipe’ of sorts in base R, which you can construct using an operator
of the form ->.;
. I use it on the second line of this function just
because converting combn
output into a long data frame format is a bit
verbose.
To check our working, run on the example dataset:
example <- c(35, 20, 15, 25, 47, 40, 62, 55, 65, 95, 102, 117, 150, 182, 127, 219, 299, 277, 309, 576) find_error(example, 5) [1] 127
And now with the real input data. As in earlier exercises, we need floating point numbers, rather than integers, because the large numbers in the real input can cause an integer overflow.
input <- as.double(readLines('input09.txt')) (invalid <- find_error(input)) [1] 542529149
Contiguous set
To find the longest contiguous set of numbers that add up to the value above, we first recognise that the values are non-negative, so we can immediately exclude any elements that are after our target invalid element.
My procedure will then go as follows. We first initialize an empty set. Then, iterating backwards through the series:
- Add up all the values in the current set.
- If the sum is greater than the target, delete the last element.
- If the sum is equal to the target, and the current set larger than our current best set (initially empty), save this as our best so far.
- Prepend the current set with the next element in the sequence.
- Repeat 1–4 until you reach the beginning of the series.
In R code form:
contiguous_set <- function(series, target) { series <- head(series, which.max(series == target) - 1) best_set <- set <- c() for (n in rev(series)) { if (sum(set) == target & length(set) >= length(best_set)) best_set <- set if (sum(set) > target) set <- head(set, -1) set <- c(n, set) } sum(range(best_set)) }
On our example dataset we get:
contiguous_set(example, 127) [1] 62
And on the test dataset, using the value stored from part 1:
contiguous_set(input, invalid) [1] 75678618
Day 10 – Adapter array
Lagged differences
This is pretty trivial. Read in the data, append a zero, sort the numbers, compute the lagged differences (appending a 3), tabulate them and multiply the result.
input <- as.integer(readLines('input10.txt')) jolts <- c(0, sort(input), max(input) + 3) prod(table(diff(jolts))) [1] 2170
I could have equally appended the 3 in the second line as
max(input) + 3
.
Counting combinations
Again we will be working on the lagged differences. Let’s look at a few values from this sequence.
head(diff(jolts), 20) [1] 1 1 1 3 3 1 1 1 3 1 1 1 1 3 3 1 1 1 1 3
Which adapters can we remove?
We are interested in the lengths of the sub-sequences of 1
s in this
series. The R function rle
will give the run-length encoding, i.e. the
lengths of subsequences of consecutive equal values in our vector.
For a length-n subsequence of differences equal to 1:
- if n = 1 the adapter can’t be removed because the gap would then
be 4:
- ${0 \choose 0} = 1$
- if n = 2 then you can only remove the first adapter (or not):
- ${1 \choose 0} + {1 \choose 1} = 2$
- if n = 3 you can keep them all, remove 1 or both of the first 2:
- ${2 \choose 0} + {2 \choose 1} + {2 \choose 2} = 4$
- if n = 4 you can keep them all, or remove up to 2 of the first
3:
- ${3 \choose 0} + {3 \choose 1} + {3 \choose 2} = 7$
- and so on (though actually there aren’t any sequences longer than 4)
Then multiply all these numbers of combinations together for every subsequence.
sequences <- as.data.frame(unclass(rle(diff(jolts)))) sequences <- subset(sequences, values == 1) count_combos <- function(n) sum( choose(n-1, 0:2) ) sequences <- transform(sequences, combos = sapply(lengths, count_combos)) prod(sequences$combos) [1] 2.480359e+13
We probably don’t want scientific notation, so reformat the result:
format(prod(sequences$combos), scientific = FALSE) [1] "24803586664192"
Day 11 – Seating system
Convoluted solution
This puzzle is effectively applying a convolution matrix (the set of rules) to a 2-dimensional image (the seating plan).
We can import the data as a logical (binary) matrix where zero or
FALSE
means a seat is empty, and one or TRUE
means it is occupied.
Floor space is set to NA
.
input <- do.call(rbind, strsplit(readLines('input11.txt'), '')) input <- input != 'L' input[input > 0] <- NA
The rules are:
- If a seat is empty (
L
) and there are no occupied seats adjacent to it, the seat becomes occupied. - If a seat is occupied (
#
) and four or more seats adjacent to it are also occupied, the seat becomes empty. - Otherwise, the seat’s state does not change.
A convolution kernel matrix can therefore be of the form:
$$\begin{bmatrix}-1 & -1 & -1 \\-1 & 3 & -1 \\-1 & -1 & -1\end{bmatrix}$$
Which is followed by the filter:
- If the result is zero or more, then occupy the seat (set equal to
TRUE
or 1) - Otherwise, empty the seat (set equal to
FALSE
or 0) - If a cell is meant to be floor space, reset to zero (because
OpenImageR
is not currently written to handleNA
s).
kernel <- matrix(c(rep(-1, 4), 3, rep(-1, 4)), 3, 3) convoluted_seating <- function(input, kernel) { seats <- replace(input, is.na(input), 0) for (i in 1:100) { convolved <- OpenImageR::convolution(seats, kernel, mode = 'same') new_seats <- replace(convolved >= 0, is.na(input), 0) if ( all(seats == new_seats) ) return(sum(new_seats)) seats <- new_seats } stop('Failed to converge after 100 iterations') } convoluted_seating(input, kernel) [1] 2194
This took 95 iterations.
Line of sight
In the first part, floor space was just treated like a seat that nobody sits in. Now, we have to change our convolution matrix for each pixel such that, if there is no seat in one direction, we cast our gaze further and borrow the state of a more distant seat.
Unfortunately most image analysis packages only accept a constant matrix as the kernel argument, rather than a function, so we shall have to roll our own.
Firstly, we run an algorithm to determine which seats are visible. This only needs to be run once.
For each seat:
- Set radius equal to 1.
- Look in each of the eight directions for a seat. Is a seat visible?
- For any direction where this is not true, increase radius by 1.
- Repeat 2–3 until a visible seat is recorded for every direction.
Today I discovered that which()
has an extra argument arr.ind
that,
if TRUE
, returns matrix indices. Handy for quickly converting a matrix
into a long (possibly sparse) representation.
seat_ids <- which(!is.na(input), arr.ind = TRUE) dirs <- subset(expand.grid(down = -1:1, right = -1:1), down | right) radial_search <- function(seat, directions, radius = 1) { if (!nrow(directions)) return(NULL) i <- seat[1] + radius * directions[['down']] j <- seat[2] + radius * directions[['right']] in_bounds <- i > 0 & i <= nrow(input) & j > 0 & j <= ncol(input) ij <- cbind(i, j)[in_bounds, , drop = FALSE] seat_exists <- !is.na( input[ij] ) remaining_dirs <- directions[in_bounds, ][!seat_exists, ] visible <- unname(ij[seat_exists, , drop = FALSE]) rbind(visible, radial_search(seat, remaining_dirs, radius + 1)) } line_of_sight <- apply(seat_ids, 1, radial_search, directions = dirs)
Next we run the seat changing algorithm itself. For each seat:
- Add up the number of occupied seats visible from this one.
- If sum is zero and seat is unoccupied, occupy the seat.
- Else if sum is ≥ 5 and seat is occupied, empty the seat.
Repeat until seating allocation does not change.
change_places <- function(visible, input) { seating_plan <- input floor <- is.na(seating_plan) for (iter in 1:100) { new_seating_plan <- seating_plan for (seat in seq_along(visible)) { current <- seating_plan[!floor][seat] neighbours <- sum(seating_plan[visible[[seat]]]) if (current & neighbours >= 5) { new_seating_plan[!floor][seat] <- 0 } else if (!current & !neighbours) { new_seating_plan[!floor][seat] <- 1 } } if (all(seating_plan == new_seating_plan, na.rm = TRUE)) { return(sum(seating_plan, na.rm = TRUE)) } seating_plan <- new_seating_plan } stop('Failed to converge after 100 iterations') } change_places(line_of_sight, input) [1] 1944
This was pretty slow, which is to be expected in R. To speed it up, we can rewrite the guts in a lower-level programming language. There may also be some scope for vectorisation.
Day 12 – Rain risk
Complex directions
I was actually expecting this to be more complicated, with turns in
arbitrary numbers of degrees. But it turns out that they are all
multiples of 90°, so all F
instructions can be simply converted into
N
, E
, S
or W
without invoking trigonometry.
library(tidyr) library(dplyr) instructions <- tibble(input = readLines('input12.txt')) %>% extract(input, c('direction', 'value'), '(\\w)(\\d+)', convert = TRUE) %>% mutate(bearing = cumsum(- value * (direction == 'L') + value * (direction == 'R')), bearing = (90 + bearing) %% 360, cardinal = ifelse(direction == 'F', c('N', 'E', 'S', 'W')[1 + bearing / 90], direction)) head(instructions) # A tibble: 6 x 4 direction value bearing cardinal <chr> <int> <dbl> <chr> 1 F 8 90 E 2 N 2 90 N 3 F 32 90 E 4 F 17 90 E 5 E 4 90 E 6 N 4 90 N
Now let’s work out where we are. No need to store the latitude and longitude in separate columns; we can add them up as complex numbers and then sum the real and imaginary parts.
instructions %>% mutate(east = (cardinal == 'E') - (cardinal == 'W'), north = (cardinal == 'N') - (cardinal == 'S')) %>% summarise(position = sum(value * (east + north * 1i)), distance = abs(Re(position)) + abs(Im(position))) # A tibble: 1 x 2 position distance <cpl> <dbl> 1 -127-752i 879
Euler’s Bermuda Triangle
Now the N
, E
, S
and W
directions store up instructions, which
are performed by the ship every time F
is invoked.
The value of these instructions is rotated in the complex plane for
every L
or R
turn. Euler’s
formula states:
$$e^{ix} = \cos x + i\sin x,$$ and we can use this to work out how to
transform the relative coordinates of the waypoint to the ship every
time there is a turn.
In the complex plane:
- turning 90° to the right is equivalent to multiplying by $0-i$,
- turning 90° to the left is equivalent to multiplying by $0+i$,
- turning 180° is equivalent to multiplying by $-1+0i$,
- turning 270° is equivalent to turning by -90° so use the rule above.
instructions %>% mutate( east = value * ((direction == 'E') - (direction == 'W')), north = value * ((direction == 'N') - (direction == 'S')), radians = value * 2 * pi / 360 * ((direction == 'L') - (direction == 'R')), rotate = exp(1i * radians), waypoint = (10 + 1i + cumsum((east + north * 1i) * cumprod(1 / rotate))), ) %>% summarise(position = sum(value * (direction == 'F') * waypoint * cumprod(rotate)), distance = abs(Re(position)) + abs(Im(position))) # A tibble: 1 x 2 position distance <cpl> <dbl> 1 17936-171i 18107.
So that we can take advantage of cumsum
and cumprod
vectorisation,
we rotate the ship rather than the waypoint, then reverse the rotation
at the end to get the final position of the ship relative to its
starting point.
Day 13 – Shuttle search
Earliest bus
A bus’s ID indicates the interval between departures, starting at time 0.
This is simple modular arithmetic: find the remainder when the timestamp is divided by each bus’s ID/interval, then multiply the smallest such remainder with the corresponding bus’s ID.
But we want the bus to arrive after we start waiting at the bus stop, not before. So we negate the timestamp.
input <- readLines('input13.txt') timestamp <- as.integer(input[1]) buses <- as.integer(strsplit(input[2], ',')[[1]]) inservice <- buses[!is.na(buses)] inservice[which.min(-timestamp %% inservice)] * min(-timestamp %% inservice) [1] 222
Bus cluster
We seek a timepoint $t$ at which our first listed bus arrives, the second bus arrives at time $t+1$, and so on. Thus it must have the following properties:
- $t \equiv 0 \mod b_0$
- $t \equiv -1 \mod b_1$
- …
- $t \equiv -n \mod b_n$
First looking at the examples, we can take this naïve approach:
find_timetable <- function(buses, maxit = 1e5) { offsets <- seq_along(buses) - 1 for (i in seq_len(maxit)) { t <- buses[1] * i if ( all((-t %% buses) == offsets, na.rm = TRUE) ) return(t) } stop('Failed to find a valid t') } find_timetable(c(17, NA, 13, 19)) [1] 3417 find_timetable(c(67, 7, 59, 61)) [1] 754018
But this probably isn’t going to scale well. Time to dust off a bit of number theory. By the Chinese remainder theorem, for any $a$, $b$ and coprime $m$, $n$, there exists a unique $x (\mod mn)$ such that $x \equiv a \mod m$ and $x \equiv b \mod n$.
Here $a,b$ are offsets (the position of the bus in the list), $n$ represents a bus ID and $x$ is the solution we seek.
The algorithm will be as follows:
- Test values in the sequence $a_1, a_1 + n_1, a_1 + 2n_1, \dots$ to find the first time $x_1$ at which a bus arrives and the second bus arrives 1 minute later.
- Test values in the sequence $x_1, x_1 + n_1n_2, x_1 + 2n_1n_2, \dots$ to get a valid time for the first three buses.
- Repeat.
sieve <- function(a1, a2, n1, n2, maxit = 1e5) { x <- a1 + n1 * (0:maxit) x[which.max(x %% n2 == a2 %% n2)] } find_timetable2 <- function(buses) { offsets <- -(seq_along(buses) - 1)[!is.na(buses)] # a buses <- buses[!is.na(buses)] # n x <- offsets[1] for (i in 2:length(buses)) x <- sieve(x, offsets[i], prod(head(buses, i-1)), buses[i]) x } format(find_timetable2(buses), sci = FALSE) [1] "408270049879073"
Day 14 – Docking data
Bitmask
The trickiest bit(!) in the first part is reading in the data. I wanted
a data frame that ‘remembered’ the value of the last mask set. The other
part is converting to and from binary. To help you along the way, R has
a function called intToBits
, but be careful because it converts to 32
bits and the puzzle is 36-bit.
library(dplyr) library(tidyr) intTo36Bits <- function(n) { bit32 <- rev(as.character(intToBits(n))) c(rep(0, 4), as.integer(bit32)) } binaryToInt <- function(b) { b <- as.integer(strsplit(b, '')[[1]]) sum(b * 2^rev(seq_along(b) - 1)) } mask <- function(mask, x) { mask <- suppressWarnings(as.integer(strsplit(mask, '')[[1]])) x <- as.integer(strsplit(x, '')[[1]]) x[!is.na(mask)] <- mask[!is.na(mask)] paste(x, collapse = '') } program <- read.table('input14.txt', sep = '=', strip.white = TRUE, col.names = c('key', 'value')) %>% extract(key, c('dest', 'address'), '(mem|mask)\\[?(\\d*)\\]?', convert = TRUE) %>% mutate(mask = value[which(dest == 'mask')[cumsum(dest == 'mask')]]) %>% filter(dest == 'mem') %>% select(-dest) %>% mutate(value = as.integer(value), value_binary = lapply(value, intTo36Bits), value_binary = sapply(value_binary, paste, collapse = ''), value_masked = mapply(mask, mask, value_binary))
As an example, here is the first value 51331021
being masked to become
62069628301
:
value: 000000000011000011110011111111001101 mask: 1110X1110XXX101X0011010X110X10X0110X result: 111001110011101000110101110110001101
What is the sum of the values in memory? Well, since we are just setting values, the only value we care about is the last one for each address. Whatever values they took before the end are unimportant.
program %>% group_by(address) %>% summarise(last_integer = binaryToInt(last(value_masked))) %>% pull(last_integer) %>% sum %>% format(scientific = FALSE) # 14862056079561 [1] "14862056079561"
Memory address decoder
In part 2, the mask applies to the memory address, not to the value. Thus the same value gets applied to possibly many addresses.
It also helps to read the question properly. I got stuck on this for ages until I eventually noticed the part that says
If the bitmask bit is 0, the corresponding memory address bit is unchanged.
which meant my mask was doing the wrong thing, even before the floating bits.
decode <- function(mask, x) { mask <- suppressWarnings(as.integer(strsplit(mask, '')[[1]])) x <- as.integer(strsplit(x, '')[[1]]) x[!is.na(mask) & mask] <- mask[!is.na(mask) & mask] # no change if mask is 0! n_floating <- sum(is.na(mask)) decoded <- c() for (i in seq_len(2^n_floating) - 1) { x[is.na(mask)] <- tail(intTo36Bits(i), n_floating) decoded <- c(decoded, paste(x, collapse = '')) } decoded } program %>% select(-value_masked) %>% mutate(address_binary = sapply(lapply(address, intTo36Bits), paste, collapse = ''), address_decoded = mapply(decode, mask, address_binary)) %>% select(address_decoded, value_binary) %>% tidyr::unnest_longer(address_decoded) %>% group_by(address_decoded) %>% summarise(last_integer = binaryToInt(last(value_binary))) %>% pull(last_integer) %>% sum %>% format(scientific = FALSE) [1] "3296185383161"
There are many ways I could have improved this solution. In particular, there wasn’t any actual reason why I needed to compress the binary digits into a string representation between operations—other than making the tables of values easier to read during debugging. I could have stored them as vectors or matrices in list-columns, instead.
Day 15 – Rambunctious recitation
Memory game
The first part is straightforward even with not particularly optimal code:
memory_game <- function(n, start) { nstart <- length(start) spoken <- integer(10) spoken[1:nstart] <- start for (i in nstart:(n-1)) { before <- which(spoken[1:(i-1)] == spoken[i]) if (!length(before)) { spoken[i+1] <- 0 } else spoken[i+1] <- i - tail(before, 1) } spoken[n] } memory_game(2020, c(7, 12, 1, 0, 16, 2)) [1] 410
Long-term memory
For the 30 millionth number spoken, it’s probably not very efficient to carry the whole vector with us. How can we make it more efficient? This is ten times faster:
memory_game2 <- function(n, start) { nstart <- length(start) spoken <- start[-nstart] when <- seq_along(spoken) current <- start[nstart] for (i in nstart:(n-1)) { if (!current %in% spoken) { next_number <- 0 spoken <- c(spoken, current) when <- c(when, i) } else { next_number <- i - when[spoken == current] when[spoken == current] <- i } current <- next_number } current } memory_game(2020, c(7, 12, 1, 0, 16, 2)) [1] 410
Unfortunately, that still just isn’t fast enough for the problem we have, partly because it involves growing the size of a large vector instead of fixing its length in advance. We can use direct lookup from a vector instead. Treat the indices of a vector (-1, because R indexes from 1) as the possible spoken numbers, and the values at those indices as the last time that number was spoken, or zero if it has not been said so far.
This vector needs to be of length equal to the number of rounds in the game, i.e. 30 million elements long, which is not very big in the grand scheme of things. (When testing, you should also make sure it is at least as long as the size of the maximum value in the starting numbers.)
memory_game3 <- function(n, start) { nstart <- length(start) spoken <- numeric(max(n, start) + 1) spoken[start[-nstart] + 1] <- seq_len(nstart - 1) current <- start[nstart] for (i in nstart:(n-1)) { next_number <- (spoken[current + 1] > 0) * i - spoken[current + 1] spoken[current + 1] <- i current <- next_number } current } memory_game3(3e7, c(7, 12, 1, 0, 16, 2)) [1] 238
According to microbenchmark
, the second implementation is about 5
times faster than the first, and my final implementation is 10 times
faster than that. It takes just a second or two to run. A lower-level
implementation in C++ might be even faster, but I already have the star
now…
I will update this post as I complete future puzzles.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.