Site icon R-bloggers

Keyword Searches from Comma Separated Terms

[This article was first published on RLang.io | R Language Programming, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Long story short, I need to convert a pretty simple OR search to a non-directional AND keyword search. Direction is straightforward, with just using [.*?] between words (or in SQL using LIKE keyword_1%keyword_2). Anyhow, I came up with this little function and thought I would share.

keyword_search <- paste0(sapply(unlist(strsplit("keyword_1,keyword_2", ",")),function(x) {
 return(paste0("(?=.*?(",x,"))"))
}),collapse="")

Now this sets keyword_search to a really nice regular expression that can be used with grep.

NOTE: You will need to use PERL = TRUE when using the generated regular expression.

(?=.*?(keyword_1))(?=.*?(keyword_2))

Results from regex101 show the following breakdown for the curious

Positive Lookahead
  • (?=.*?(keyword_1))
  • Assert that the Regex below matches
  • .*? matches any character
  • *? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
 1st Capturing Group
  • (keyword_1)
  • keyword_1 matches the characters keyword_1
Positive Lookahead
  • (?=.*?(keyword_2))
  • Assert that the Regex below matches
  • .*? matches any character
  • *? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
 2nd Capturing Group
  • (keyword_2)
  • keyword_2 matches the characters keyword_2l

To leave a comment for the author, please follow the link and comment on their blog: RLang.io | R Language Programming.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.