PowerQuery Puzzle solved with R

[This article was first published on Numbers around us - Medium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

#153–156

Puzzles

Author: ExcelBI

All files (xlsx with puzzle and R with solution) for each and every puzzle are available on my Github. Enjoy.

Similarly to the Excel Puzzles we have doubled episode today because of my winter holidays. Do not worry. Puzzles are still hot, although two weeks old. Lets go then.

Puzzle #153

This time we have four riddles to solve and first two of them are about pilots life. Balance between flight and free time. We all agree it is important to have fresh mind when having lives of about hundred people in your hand.

As you probably noticed in my previous episode usually code I am providing could be shorter, but I make one goal for me. Everytime I can I would like to not only get proper numbers, but also exactly the same shape of data structure. This time I used transforming to factors to get the same order of pilots in my result.

So today basing on flight starts and ends we need to calculate not only flight times, but also time pilot used to relax between flights. Let’s go.\

Loading libraries and data

library(tidyverse)
library(readxl)

input = read_excel("Power Query/PQ_Challenge_153.xlsx", range = "A1:C13") %>% 
  janitor::clean_names()
test  = read_excel("Power Query/PQ_Challenge_153.xlsx", range = "E1:G5") %>%
  janitor::clean_names()

Transformation

input$pilot = factor(input$pilot, levels = unique(input$pilot), ordered = TRUE)
test$pilot = factor(test$pilot, levels = unique(test$pilot), ordered = TRUE)

result = input %>%
  group_by(pilot) %>%
  mutate(prev_landing = lag(flight_end, default = NA_POSIXct_),
         flight_time = flight_end - flight_start,
         rest_time = flight_start - prev_landing) %>%
  summarise(fly_time = sum(flight_time, na.rm = TRUE),
            rest_time = sum(rest_time, na.rm = TRUE)) %>%
  mutate(fly_time = as.numeric(fly_time, units = "hours") %>% round(2),
         rest_time = as.numeric(rest_time, units = "hours") %>% round(2)) %>%
  arrange(pilot) %>%
  ungroup() %>%
  mutate(fly_time = ifelse(fly_time == 0, NA, fly_time),
         rest_time = ifelse(rest_time == 0, NA, rest_time))

Validation

identical(result, test)
# [1] TRUE

Puzzle #154

We meet our pilots again, but this time we need much more detailed report of their resting time. It should be done by year and month for both flying and resting.

I must admit that my code produced some discrepancies in comparison to given solution. I haven’t find the reason yet, but if you would like to investigate it, I’m encouraging you to do it.

Loading libraries and data

library(tidyverse)
library(readxl)

input = read_excel("Power Query/PQ_Challenge_154.xlsx", range = "A1:C10") %>% 
  janitor::clean_names()
test  = read_excel("Power Query/PQ_Challenge_154.xlsx", range = "E1:I23") %>%
  janitor::clean_names()

Transformation and validation

input$pilot = factor(input$pilot, levels = unique(input$pilot), ordered = TRUE)
test$pilot = factor(test$pilot, levels = unique(test$pilot), ordered = TRUE)

fly = input 
rest = input %>%
  group_by(pilot) %>%
  mutate(prev_end = lag(flight_end, default = NA_POSIXct_)) %>%
  ungroup() %>%
  select(pilot, rest_start = prev_end, rest_end = flight_start) %>%
  na.omit()

get_months = function(start, end) {
  seq = seq(floor_date(start, "month"), ceiling_date(end, "month"), by = "month")
  seq[1] <- start
  seq[length(seq)] <- end
  df = tibble(start = seq[1:(length(seq)-1)], end = seq[2:length(seq)])
  return(df)
}

a = fly %>%
  mutate(df = map2(flight_start, flight_end, get_months)) %>%
  unnest(df) %>%
  select(pilot, start, end) %>%
  mutate(mode = "fly")
b = rest %>%
  mutate(df = map2(rest_start, rest_end, get_months)) %>%
  unnest(df) %>%
  select(pilot, start, end) %>%
  mutate(mode = "rest")

result = bind_rows(a, b) %>%
  mutate(month = month(start),
         year = year(start),
         duration = difftime(end, start, "hours")) %>%
  group_by(pilot, mode, month, year) %>%
  summarise(duration = sum(duration, na.rm = TRUE) %>% as.numeric() %>% round(2)) %>%
  ungroup() %>%
  pivot_wider(names_from = mode, values_from = duration, 
              values_fill = list(duration = 0), names_glue = "{mode}_time") %>%
  left_join(test, by = c("pilot","year", "month"), suffix = c("", "_test")) %>%
  mutate(check_fly = fly_time == fly_time_test,
         check_rest = rest_time == rest_time_test)

Puzzle #155

As puzzle #154 was pretty hard, last two of them were just simple workout. This one were about extracting proper (24h system) electronic watch indications from given strings. Let’s find out how easy is it.

Loading libraries and data

library(tidyverse)
library(readxl)

input = read_excel("Power Query/PQ_Challenge_155.xlsx", range = "A1:A10")
test  = read_excel("Power Query/PQ_Challenge_155.xlsx", range = "D1:D10")

Transformation

extract = function(string) {
  subs = string %>% 
    str_extract_all("\\d{1,2}\\:\\d{2}") %>%
    unlist()

  subs = purrr::map_chr(subs, function(x) {
    ifelse(
      as.numeric(strsplit(x, ":")[[1]][1]) %in% 0:23 & as.numeric(strsplit(x, ":")[[1]][2]) %in% 0:59,
      x,
      NA_character_
    )
  }) %>%
    na.omit() %>%
    str_c(collapse = ", ")

  return(subs)
}

result = input %>% 
  mutate(extracted = map_chr(String, extract),
         extracted = ifelse(extracted == "", NA_character_, extracted))

Validation

identical(result$extracted, test$`Expected Answer`)
# [1] TRUE

Puzzle #156

I’m not sure if it is difficult in any way in Power Query, but for R it is one of the easiest puzzles ever. We have two tables with common column and need to combine them to be like crosstab. So be it.

Loading libraries and data

library(tidyverse)
library(readxl)

input1 = read_excel("Power Query/PQ_Challenge_156.xlsx", range = "A1:B10")
input2 = read_excel("Power Query/PQ_Challenge_156.xlsx", range = "D1:E5")

test = read_excel("Power Query/PQ_Challenge_156.xlsx", range = "G1:K6")

Transformation

result = input1 %>% 
  left_join(input2, by = "Subjects") %>%
  pivot_wider(names_from = "Subjects", values_from = "Teacher", values_fill =  NA_character_) %>%
  select(Name, Biology, Chemistry, Geology, Physics)

Validation

identical(result, test)
# [1] TRUE

Feel free to comment, share and contact me with advices, questions and your ideas how to improve anything. Contact me on Linkedin if you wish as well.


PowerQuery Puzzle solved with R was originally published in Numbers around us on Medium, where people are continuing the conversation by highlighting and responding to this story.

To leave a comment for the author, please follow the link and comment on their blog: Numbers around us - Medium.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)