Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This is my second blog post from the series of My R take on Advent of Code
. If you’d like to know more about Advent of Code, check out the first post from the series or simply go to their website. Below you’ll find the challnge from Day 2 and the solution that worked for me. As always, feel free to leave comments if you have different ideas on how this could have been solved!
Day 2 Puzzle
(…) you scan the likely candidate boxes again, counting the number that have an ID containing exactly two of any letter and then separately counting those with exactly three of any letter. You can multiply those two counts together to get a rudimentary checksum and compare it to what your device predicts. For example, if you see the following box IDs:
abcdef
contains no letters that appear exactly two or three times.
bababc
contains twoa
and threeb
, so it counts for both.
abbcde
contains twob
, but no letter appears exactly three times.
abcccd
contains threec
, but no letter appears exactly two times.
aabcdd
contains twoa
and twod
, but it only counts once.
abcdee
contains twoe
.
ababab
contains threea
and threeb
, but it only counts once.
Of these box IDs, four of them contain a letter which appears exactly twice, and three of them contain a letter which appears exactly three times. Multiplying these together produces a checksum of 4 * 3 = 12. What is the checksum for your list of box IDs?
So what is it all about? As complicated as it may sound, essentially we need to:
- understand which string contains letters that appear exactly 2 times
- understand which string contains letters that appear exactly 3 times
- count the number of each type of string
- multiply them together
Doesn’t sound so bad anymore, ey? This is how we can go about it:
First load your key packages…
library(dplyr) library(stringr) library(tibble) library(purrr)
… and have a look at what the raw input looks like.
# check raw input glimpse(input) ## chr "xrecqmdonskvzupalfkwhjctdb\nxrlgqmavnskvzupalfiwhjctdb\nxregqmyonskvzupalfiwhjpmdj\nareyqmyonskvzupalfiwhjcidb\"| __truncated__
Right, Advent of Code will never give you nice and clean data to work with, that’s for sure. But it doesn’t look like things are too bad this time – let’s just split it by the new line and keep it as a vector for now. Does it look reaosnably good?
# clean it clean_input = strsplit(input, '\n') %>% unlist() # splt by NewLine glimpse(clean_input) ## chr [1:250] "xrecqmdonskvzupalfkwhjctdb" "xrlgqmavnskvzupalfiwhjctdb" ...
Much better! Now, let’s put it all in a data frame for now, we’ll need it very soon.
# put it in the data.frame df2 <- tibble(input = str_trim(clean_input)) head(df2) ## # A tibble: 6 x 1 ## input ## <chr> ## 1 xrecqmdonskvzupalfkwhjctdb ## 2 xrlgqmavnskvzupalfiwhjctdb ## 3 xregqmyonskvzupalfiwhjpmdj ## 4 areyqmyonskvzupalfiwhjcidb ## 5 xregqpyonskvzuaalfiwhjctdy ## 6 xwegumyonskvzuphlfiwhjctdb
Now, the way I approached this was to split each word into letters and then count how many times they occured. Then, for identifying words with 2 occurences, I filtered only those that occur twice and if the final table has any rows, then this counts as yes. Take the first example:
strsplit(input, '\n') %>% unlist() %>% .[[1]] # get the first example ## [1] "xrecqmdonskvzupalfkwhjctdb"
Let’s split it by the letter, put it in a tibble and count each letter occurances:
strsplit(input, '\n') %>% unlist() %>% .[[1]] %>% # get the first example strsplit('') %>% # split letters unlist() %>% # get a vector as_tibble() %>% # trasform vector to tibble rename_(letters = names(.)[1]) %>% # name the column: letters count(letters) ## # A tibble: 23 x 2 ## letters n ## <chr> <int> ## 1 a 1 ## 2 b 1 ## 3 c 2 ## 4 d 2 ## 5 e 1 ## 6 f 1 ## 7 h 1 ## 8 j 1 ## 9 k 2 ## 10 l 1 ## # ... with 13 more rows
Now, do we have any double occurances there?
# test: counting double letter occurances strsplit(input, '\n') %>% unlist() %>% .[[1]] %>% # get the first example strsplit('') %>% # split letters unlist() %>% # get a vector as_tibble() %>% # trasform vector to tibble rename_(letters = names(.)[1]) %>% # name the column: letters count(letters) %>% # count letter occurances filter(n == 2) %>% # get only those with double occurances nrow() # how many are there? ## [1] 3
Definitely yes. Let’s repeat the process for tripple occurances:
# test: counting triple letter occurances strsplit(input, '\n') %>% unlist() %>% .[[1]] %>% # get the first example strsplit('') %>% # split letters unlist() %>% as_tibble() %>% # trasforming vector to tibble rename_(letters = names(.)[1]) %>% count(letters) %>% filter(n == 3) %>% nrow() ## [1] 0
Not much luck with those in this case. To make our life easier, let’s wrap both calculations in functions…
### wrap-up in functions # count double occurances count2 <- function(x) { result2 <- as.character(x) %>% strsplit('') %>% # split by letters unlist() %>% as_tibble() %>% # trasforming vector to tibble rename_(letters = names(.)[1]) %>% count(letters) %>% # count letter occurances filter(n == 2) %>% nrow() return(result2) } # count triple occurances count3 <- function(x) { result2 <- as.character(x) %>% strsplit('') %>% unlist() %>% as_tibble() %>% # trasforming vector to tibble rename_(letters = names(.)[1]) %>% count(letters) %>% filter(n == 3) %>% nrow() return(result2) }
…and apply them to the whole dataset:
### apply functions to input occurs2 <- map_int(df2$input, count2) occurs3 <- map_int(df2$input, count3) str(occurs2) ## int [1:250] 3 3 3 3 2 3 3 2 2 2 ...
Now, all we need to do is check how many positive elements we have in each vector and multiple their lengths by each other:
#solution length(occurs2[occurs2 != 0]) * length(occurs3[occurs3 != 0]) ## [1] 5976
Voila!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.