PowerQuery Puzzle solved with R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
#153–156
Puzzles
Author: ExcelBI
All files (xlsx with puzzle and R with solution) for each and every puzzle are available on my Github. Enjoy.
Similarly to the Excel Puzzles we have doubled episode today because of my winter holidays. Do not worry. Puzzles are still hot, although two weeks old. Lets go then.
Puzzle #153
This time we have four riddles to solve and first two of them are about pilots life. Balance between flight and free time. We all agree it is important to have fresh mind when having lives of about hundred people in your hand.
As you probably noticed in my previous episode usually code I am providing could be shorter, but I make one goal for me. Everytime I can I would like to not only get proper numbers, but also exactly the same shape of data structure. This time I used transforming to factors to get the same order of pilots in my result.
So today basing on flight starts and ends we need to calculate not only flight times, but also time pilot used to relax between flights. Let’s go.\
Loading libraries and data
library(tidyverse) library(readxl) input = read_excel("Power Query/PQ_Challenge_153.xlsx", range = "A1:C13") %>% janitor::clean_names() test = read_excel("Power Query/PQ_Challenge_153.xlsx", range = "E1:G5") %>% janitor::clean_names()
Transformation
input$pilot = factor(input$pilot, levels = unique(input$pilot), ordered = TRUE) test$pilot = factor(test$pilot, levels = unique(test$pilot), ordered = TRUE) result = input %>% group_by(pilot) %>% mutate(prev_landing = lag(flight_end, default = NA_POSIXct_), flight_time = flight_end - flight_start, rest_time = flight_start - prev_landing) %>% summarise(fly_time = sum(flight_time, na.rm = TRUE), rest_time = sum(rest_time, na.rm = TRUE)) %>% mutate(fly_time = as.numeric(fly_time, units = "hours") %>% round(2), rest_time = as.numeric(rest_time, units = "hours") %>% round(2)) %>% arrange(pilot) %>% ungroup() %>% mutate(fly_time = ifelse(fly_time == 0, NA, fly_time), rest_time = ifelse(rest_time == 0, NA, rest_time))
Validation
identical(result, test) # [1] TRUE
Puzzle #154
We meet our pilots again, but this time we need much more detailed report of their resting time. It should be done by year and month for both flying and resting.
I must admit that my code produced some discrepancies in comparison to given solution. I haven’t find the reason yet, but if you would like to investigate it, I’m encouraging you to do it.
Loading libraries and data
library(tidyverse) library(readxl) input = read_excel("Power Query/PQ_Challenge_154.xlsx", range = "A1:C10") %>% janitor::clean_names() test = read_excel("Power Query/PQ_Challenge_154.xlsx", range = "E1:I23") %>% janitor::clean_names()
Transformation and validation
input$pilot = factor(input$pilot, levels = unique(input$pilot), ordered = TRUE) test$pilot = factor(test$pilot, levels = unique(test$pilot), ordered = TRUE) fly = input rest = input %>% group_by(pilot) %>% mutate(prev_end = lag(flight_end, default = NA_POSIXct_)) %>% ungroup() %>% select(pilot, rest_start = prev_end, rest_end = flight_start) %>% na.omit() get_months = function(start, end) { seq = seq(floor_date(start, "month"), ceiling_date(end, "month"), by = "month") seq[1] <- start seq[length(seq)] <- end df = tibble(start = seq[1:(length(seq)-1)], end = seq[2:length(seq)]) return(df) } a = fly %>% mutate(df = map2(flight_start, flight_end, get_months)) %>% unnest(df) %>% select(pilot, start, end) %>% mutate(mode = "fly") b = rest %>% mutate(df = map2(rest_start, rest_end, get_months)) %>% unnest(df) %>% select(pilot, start, end) %>% mutate(mode = "rest") result = bind_rows(a, b) %>% mutate(month = month(start), year = year(start), duration = difftime(end, start, "hours")) %>% group_by(pilot, mode, month, year) %>% summarise(duration = sum(duration, na.rm = TRUE) %>% as.numeric() %>% round(2)) %>% ungroup() %>% pivot_wider(names_from = mode, values_from = duration, values_fill = list(duration = 0), names_glue = "{mode}_time") %>% left_join(test, by = c("pilot","year", "month"), suffix = c("", "_test")) %>% mutate(check_fly = fly_time == fly_time_test, check_rest = rest_time == rest_time_test)
Puzzle #155
As puzzle #154 was pretty hard, last two of them were just simple workout. This one were about extracting proper (24h system) electronic watch indications from given strings. Let’s find out how easy is it.
Loading libraries and data
library(tidyverse) library(readxl) input = read_excel("Power Query/PQ_Challenge_155.xlsx", range = "A1:A10") test = read_excel("Power Query/PQ_Challenge_155.xlsx", range = "D1:D10")
Transformation
extract = function(string) { subs = string %>% str_extract_all("\\d{1,2}\\:\\d{2}") %>% unlist() subs = purrr::map_chr(subs, function(x) { ifelse( as.numeric(strsplit(x, ":")[[1]][1]) %in% 0:23 & as.numeric(strsplit(x, ":")[[1]][2]) %in% 0:59, x, NA_character_ ) }) %>% na.omit() %>% str_c(collapse = ", ") return(subs) } result = input %>% mutate(extracted = map_chr(String, extract), extracted = ifelse(extracted == "", NA_character_, extracted))
Validation
identical(result$extracted, test$`Expected Answer`) # [1] TRUE
Puzzle #156
I’m not sure if it is difficult in any way in Power Query, but for R it is one of the easiest puzzles ever. We have two tables with common column and need to combine them to be like crosstab. So be it.
Loading libraries and data
library(tidyverse) library(readxl) input1 = read_excel("Power Query/PQ_Challenge_156.xlsx", range = "A1:B10") input2 = read_excel("Power Query/PQ_Challenge_156.xlsx", range = "D1:E5") test = read_excel("Power Query/PQ_Challenge_156.xlsx", range = "G1:K6")
Transformation
result = input1 %>% left_join(input2, by = "Subjects") %>% pivot_wider(names_from = "Subjects", values_from = "Teacher", values_fill = NA_character_) %>% select(Name, Biology, Chemistry, Geology, Physics)
Validation
identical(result, test) # [1] TRUE
Feel free to comment, share and contact me with advices, questions and your ideas how to improve anything. Contact me on Linkedin if you wish as well.
PowerQuery Puzzle solved with R was originally published in Numbers around us on Medium, where people are continuing the conversation by highlighting and responding to this story.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.