PowerQuery Puzzle solved with R

Numbers around us

3 hours ago

[This article was first published on Numbers around us - Medium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# 141–142

Puzzles:

PQ_141: content file
PQ_142: content file

Last time I noted that usually PQ Puzzles are focused on times and dates. And… it confirms this week. Lets count it.

PQ_141

Rolling averages and other rolling calculations are very useful concepts because it can show us main trend cleaned from lower time granulation anomalies. In R we can play with comparison of dates to dates with lagged measurements. It is not hard but can complex task. Every complex task has much bigger chance for mistakes. Thats why I use dedicated package for out today puzzle. We have measurements for months and all we need is to calculate mean average from last 3 and 5 months.

Load libraries and data

library(tidyverse)
library(slider)
library(readxl)

input = read_excel("Power Query/PQ_Challenge_141.xlsx", range = "A1:C35")
test  = read_excel("Power Query/PQ_Challenge_141.xlsx", range = "E1:I35")

Transformation

result = input %>%
  group_by(Month) %>%
  mutate(
    `3 Year MV` = slide_dbl(Defects, mean,  
                            .after = -1, 
                            .before = 3, 
                            .complete = TRUE) %>% round(0),
    `5 Year MV` = slide_dbl(Defects, mean, 
                            .after = -1, 
                            .before = 5, 
                            .complete = TRUE) %>% round(0)
  ) %>% 
  ungroup()

Validation

identical(result, test)
#> [1] TRUE

PQ_142

In this puzzle we calculate something that HR called “head count”, which means number of people hired or working in certain period of time. In our task we have to create head count for each quarter of hour from 9AM to 9PM, having start and end time for 3 people. It can be tricky in some ways but I will explain it later.

Load data and libraries

library(tidyverse)
library(readxl)
library(lubridate)

input = read_excel("Power Query/PQ_Challenge_142.xlsx", range = "A1:C4") 
  %>% janitor::clean_names()
test = read_excel("Power Query/PQ_Challenge_142.xlsx", range = "E1:F49")

Transformation

input <- input %>%
  mutate(interval = interval(ymd_hms(start_time), ymd_hms(end_time)))

quarter_table <- tibble(
  interval = interval(
    seq(ymd_hms("1899-12-31 09:00:00"), 
        ymd_hms("1899-12-31 20:45:00"), 
        by = "15 mins"),
    seq(ymd_hms("1899-12-31 09:14:59"), 
        ymd_hms("1899-12-31 20:59:59"), 
        by = "15 mins")
  )
)
head_count <- quarter_table %>%
  mutate(
    Count = map_dbl(interval, ~sum(int_overlaps(.x, input$interval))),
    Time = paste(format(int_start(interval), "%I:%M:%S %p"), 
                 format(int_end(interval), "%I:%M:%S %p"), 
                 sep = " - ")
  ) %>%
  select(Time, Count)

Validation

You can validate it by your eyes, because there is one issue. If we want to have disjoint time intervals we need to finish previous one second before the next started. But if we do it that way we have problem with person who ends its job at 19:00:00 because it would look like working only one second in next interval. That is why I cannot validate it with given results, but except one line values are exactly the same.

Thanks for your engagement, and let me know if you have any comments. Stay tuned, because this Thursday article about memoization will be online.

PowerQuery Puzzle solved with R was originally published in Numbers around us on Medium, where people are continuing the conversation by highlighting and responding to this story.

To leave a comment for the author, please follow the link and comment on their blog: Numbers around us - Medium.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Puzzles:

PQ_141

Load libraries and data

Transformation

Validation

PQ_142

Load data and libraries

Transformation

Validation

Related