PowerQuery Puzzle solved with R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
# 141–142
Puzzles:
PQ_141: content file
PQ_142: content file
Last time I noted that usually PQ Puzzles are focused on times and dates. And… it confirms this week. Lets count it.
PQ_141
Rolling averages and other rolling calculations are very useful concepts because it can show us main trend cleaned from lower time granulation anomalies. In R we can play with comparison of dates to dates with lagged measurements. It is not hard but can complex task. Every complex task has much bigger chance for mistakes. Thats why I use dedicated package for out today puzzle. We have measurements for months and all we need is to calculate mean average from last 3 and 5 months.
Load libraries and data
library(tidyverse) library(slider) library(readxl) input = read_excel("Power Query/PQ_Challenge_141.xlsx", range = "A1:C35") test = read_excel("Power Query/PQ_Challenge_141.xlsx", range = "E1:I35")
Transformation
result = input %>% group_by(Month) %>% mutate( `3 Year MV` = slide_dbl(Defects, mean, .after = -1, .before = 3, .complete = TRUE) %>% round(0), `5 Year MV` = slide_dbl(Defects, mean, .after = -1, .before = 5, .complete = TRUE) %>% round(0) ) %>% ungroup()
Validation
identical(result, test) #> [1] TRUE
PQ_142
In this puzzle we calculate something that HR called “head count”, which means number of people hired or working in certain period of time. In our task we have to create head count for each quarter of hour from 9AM to 9PM, having start and end time for 3 people. It can be tricky in some ways but I will explain it later.
Load data and libraries
library(tidyverse) library(readxl) library(lubridate) input = read_excel("Power Query/PQ_Challenge_142.xlsx", range = "A1:C4") %>% janitor::clean_names() test = read_excel("Power Query/PQ_Challenge_142.xlsx", range = "E1:F49")
Transformation
input <- input %>% mutate(interval = interval(ymd_hms(start_time), ymd_hms(end_time))) quarter_table <- tibble( interval = interval( seq(ymd_hms("1899-12-31 09:00:00"), ymd_hms("1899-12-31 20:45:00"), by = "15 mins"), seq(ymd_hms("1899-12-31 09:14:59"), ymd_hms("1899-12-31 20:59:59"), by = "15 mins") ) ) head_count <- quarter_table %>% mutate( Count = map_dbl(interval, ~sum(int_overlaps(.x, input$interval))), Time = paste(format(int_start(interval), "%I:%M:%S %p"), format(int_end(interval), "%I:%M:%S %p"), sep = " - ") ) %>% select(Time, Count)
Validation
You can validate it by your eyes, because there is one issue. If we want to have disjoint time intervals we need to finish previous one second before the next started. But if we do it that way we have problem with person who ends its job at 19:00:00 because it would look like working only one second in next interval. That is why I cannot validate it with given results, but except one line values are exactly the same.
Thanks for your engagement, and let me know if you have any comments. Stay tuned, because this Thursday article about memoization will be online.
PowerQuery Puzzle solved with R was originally published in Numbers around us on Medium, where people are continuing the conversation by highlighting and responding to this story.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.