Site icon R-bloggers

Portmanteau Words

[This article was first published on Numbers around us - Medium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Excel BI’s Excel Challenge #308 — solved in R

Defining the Puzzle:

The newest puzzle by ExcelBI is still focused on world of words. We need to determine if words are Portmanteau, which means that they are complex structure taking parts of root words which gave them new meaning: like breakfast + lunch = brunch.

List Portmanteau words.
A Portmanteau word is made by starting few alphabets from Word1 and starting or ending few alphabets from Word2.
Ex. Biopic is made from Biography and picture.

Loading Data from Excel:

Today we have 3 source columns and 1 solution column. In first part we have complex word and two columns for its potential partial words. We have to check if using column 2 and 3 we can construct column 1. Let’s read the data.

library(tidyverse)
library(readxl)
library(data.table)
library(stringi)

input = read_excel(“Portmanteau Words.xlsx”, range =”A1:C10")
test = read_excel(“Portmanteau Words.xlsx”, range =”D1:D6")

Approach 1: Tidyverse with purrr

detect_portmanteau <- function(portmanteau, word1, word2) {
indices <- seq(1, str_length(portmanteau) — 1)
portmanteau_checks <- map_lgl(indices, function(i) {
 pattern1 <- str_c(‘^’, str_sub(portmanteau, 1, i))
 pattern2 <- str_c(‘^’, str_sub(portmanteau, i + 1, -1), ‘|’, str_sub(portmanteau, i + 1, -1), ‘$’)
 
 match_word1 <- str_detect(word1, regex(pattern1, ignore_case = TRUE))
 match_word2 <- str_detect(word2, regex(pattern2, ignore_case = TRUE))
 
 return(match_word1 && match_word2)
 })
 
 is_portmanteau <- any(portmanteau_checks)
 
 return(is_portmanteau)
}

result <- input %>% 
 mutate(is_portmanteau = pmap_lgl(list(Word, Word1, Word2), detect_portmanteau)) %>%
 filter(is_portmanteau) %>%
 select(Word)

Approach 2: Base R

detect_portmanteau_base <- function(portmanteau, word1, word2) {
 indices <- seq(1, nchar(portmanteau) — 1)
 is_portmanteau <- FALSE
 
 for (i in indices) {
 pattern1 <- paste0(‘^’, substring(portmanteau, 1, i))
 pattern2 <- paste0(‘^’, substring(portmanteau, i + 1, nchar(portmanteau)), ‘|’, substring(portmanteau, i + 1, nchar(portmanteau)), ‘$’)
 
 match_word1 <- grepl(pattern1, word1, ignore.case = TRUE)
 match_word2 <- grepl(pattern2, word2, ignore.case = TRUE)
 
 if (match_word1 && match_word2) {
 is_portmanteau <- TRUE
 break
 }
 }
 return(is_portmanteau)
}

result_base <- data.frame(Word = character(), stringsAsFactors = FALSE)

for (row in 1:nrow(input)) {
  is_portmanteau <- with(input[row, ], detect_portmanteau_base(Word, Word1, Word2))
  if (is_portmanteau) {
    result_base <- rbind(result_base, input[row, "Word", drop = FALSE])
  }
}

Approach 3: Data.table

Data.table syntax doesn’t really affect function construction, that’s why I only call tidyverse function from above using dt syntax.

df <- setDT(input)

input_dt <- as.data.table(input)

input_dt[, is_portmanteau := mapply(detect_portmanteau, Word, Word1, Word2)]
result_dt <- input_dt[is_portmanteau == TRUE, .(Word)]

Validating Our Solutions:

identical(test$`Answer Expected`, result$Word)
#> [1] TRUE

identical(test$`Answer Expected`, result_base$Word)
#> [1] TRUE

identical(test$`Answer Expected`, result_dt$Word)
#> [1] TRUE

If you like my publications or have your own ways to solve those puzzles in R, Python or whatever tool you choose, let me know.


Portmanteau Words was originally published in Numbers around us on Medium, where people are continuing the conversation by highlighting and responding to this story.

To leave a comment for the author, please follow the link and comment on their blog: Numbers around us - Medium.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version