Using Rcpp with Boost.Regex for regular expression
[This article was first published on Rcpp Gallery, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Gabor asked about Rcpp use with regular expression libraries. This post shows a very simple example, based on
one of the Boost.RegEx examples.
We need to set linker options. This can be as simple as
Sys.setenv("PKG_LIBS"="-lboost_regex")
With that, the following example can be built:
// cf www.boost.org/doc/libs/1_53_0/libs/regex/example/snippets/credit_card_example.cpp #include <Rcpp.h> #include <string> #include <boost/regex.hpp> bool validate_card_format(const std::string& s) { static const boost::regex e("(\\d{4}[- ]){3}\\d{4}"); return boost::regex_match(s, e); } const boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z"); const std::string machine_format("\\1\\2\\3\\4"); const std::string human_format("\\1-\\2-\\3-\\4"); std::string machine_readable_card_number(const std::string& s) { return boost::regex_replace(s, e, machine_format, boost::match_default | boost::format_sed); } std::string human_readable_card_number(const std::string& s) { return boost::regex_replace(s, e, human_format, boost::match_default | boost::format_sed); } // [[Rcpp::export]] Rcpp::DataFrame regexDemo(std::vector<std::string> s) { int n = s.size(); std::vector<bool> valid(n); std::vector<std::string> machine(n); std::vector<std::string> human(n); for (int i=0; i<n; i++) { valid[i] = validate_card_format(s[i]); machine[i] = machine_readable_card_number(s[i]); human[i] = human_readable_card_number(s[i]); } return Rcpp::DataFrame::create(Rcpp::Named("input") = s, Rcpp::Named("valid") = valid, Rcpp::Named("machine") = machine, Rcpp::Named("human") = human); }
We can test the function using the same input as the Boost example:
s <- c("0000111122223333", "0000 1111 2222 3333", "0000-1111-2222-3333", "000-1111-2222-3333") regexDemo(s) input valid machine human 1 0000111122223333 FALSE 0000111122223333 0000-1111-2222-3333 2 0000 1111 2222 3333 TRUE 0000111122223333 0000-1111-2222-3333 3 0000-1111-2222-3333 TRUE 0000111122223333 0000-1111-2222-3333 4 000-1111-2222-3333 FALSE 000111122223333 000-1111-2222-3333
To leave a comment for the author, please follow the link and comment on their blog: Rcpp Gallery.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.