stringi 0.4-1 released – fast, portable, consistent character string processing
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A new release of the stringi package is available on CRAN (please wait a few days for Windows and OS X binary builds).
# install.packages("stringi") or update.packages()
library("stringi")
Here’s a list of changes in version 0.4-1. In the current release, we particularly focused on making the package’s interface more consistent with that of the well-known stringr package. For a general overview of stringi’s facilities and base R string processing issues, see e.g. here.
-
(IMPORTANT CHANGE)
n_maxargument instri_split_*()has been renamedn. -
(IMPORTANT CHANGE)
simplify=FALSEinstri_extract_all_*()andstri_split_*()now callsstri_list2matrix()withfill="".fill=NA_character_may be obtained by usingsimplify=NA. -
(IMPORTANT CHANGE, NEW FUNCTIONS) #120:
stri_extract_wordshas been renamedstri_extract_all_wordsandstri_locate_boundaries–stri_locate_all_boundariesas well asstri_locate_words–stri_locate_all_words. New functions are now available:stri_locate_first_boundaries,stri_locate_last_boundaries,stri_locate_first_words,stri_locate_last_words,stri_extract_first_words,stri_extract_last_words.
# uses ICU's locale-dependent word break iterator
stri_extract_all_words("stringi: THE string processing package for R")
## [[1]]
## [1] "stringi" "THE" "string" "processing" "package"
## [6] "for" "R"
- (IMPORTANT CHANGE) #111:
opts_regex,opts_collator,opts_fixed, andopts_brkitercan now be supplied individually via.... In other words, you may now simply call e.g.
stri_detect_regex(c("stringi", "STRINGI"), "stringi", case_insensitive=TRUE)
## [1] TRUE TRUE
instead of:
stri_detect_regex(c("stringi", "STRINGI"), "stringi", opts_regex=stri_opts_regex(case_insensitive=TRUE))
## [1] TRUE TRUE
-
(NEW FEATURE) #110: Fixed pattern search engine’s settings can now be supplied via
opts_fixedargument instri_*_fixed(), seestri_opts_fixed(). A simple (not suitable for natural language processing) yet very fastcase_insensitivepattern matching can be performed now.stri_extract_*_fixedis again available. -
(NEW FEATURE) #23:
stri_extract_all_fixed,stri_count, andstri_locate_all_fixedmay now also look for overlapping pattern matches, see?stri_opts_fixed.
stri_extract_all_fixed("abaBAaba", "ABA", case_insensitive=TRUE, overlap=TRUE)
## [[1]]
## [1] "aba" "aBA" "aba"
-
(NEW FEATURE) #129:
stri_match_*_regexgained acg_missingargument. -
(NEW FEATURE) #117:
stri_extract_all_*(),stri_locate_all_*(),stri_match_all_*()gained a new argument:omit_no_match. Setting it toTRUEmakes these functions compatible with theirstringrequivalents. -
(NEW FEATURE) #118:
stri_wrap()gainedindent,exdent,initial, andprefixarguments. Moreover Knuth’s dynamic word wrapping algorithm now assumes that the cost of printing the last line is zero, see #128.
cat(stri_wrap(stri_rand_lipsum(1), 40, 2.0), sep="n") ## Lorem ipsum dolor sit amet, et et diam ## vitae est ut. At tristique, tincidunt ## taciti, ac egestas vestibulum magna. ## Volutpat nisl non sed ultricies nisl ## nibh magna. Nullam rhoncus ut phasellus ## sed. Congue enim libero congue massa ## eget. Ligula, quis est amet velit. ## Accumsan amet nunc ad. Porttitor, ## sed vestibulum diam vestibulum quis ## sed gravida ultrices. Per urna enim. ## Scelerisque interdum sed vestibulum ## rhoncus quis imperdiet pharetra. Sapien ## iaculis, lacinia ac cras ante, sed ## vitae inceptos dis tristique dignissim. ## Venenatis volutpat lectus sodales, ## hac feugiat molestie mollis. A, urna ## pellentesque ante himenaeos ante at ## potenti in.
- (NEW FEATURE) #122:
stri_subset()gained anomit_naargument.
stri_subset_fixed(c("abc", NA, "def"), "a")
## [1] "abc" NA
stri_subset_fixed(c("abc", NA, "def"), "a", omit_na=TRUE)
## [1] "abc"
-
(NEW FEATURE)
stri_list2matrix()gained ann_minargument. -
(NEW FEATURE) #126:
stri_split()now is also able to act just likestringr::str_split_fixed().
stri_split_regex(c("bab", "babab"), "a", n = 3, simplify=TRUE)
## [,1] [,2] [,3]
## [1,] "b" "b" ""
## [2,] "b" "b" "b"
-
(NEW FEATURE) #119:
stri_split_boundaries()now haven,tokens_only, andsimplifyarguments. Additionally,stri_extract_all_words()is now equipped withsimplifyarg. -
(NEW FEATURE) #116:
stri_paste()gained a new argument:ignore_null. Setting it toTRUEmakes this function more compatible withpaste().
for (test in c(TRUE, FALSE))
print(stri_paste("a", if (test) 1:9, ignore_null=TRUE))
## [1] "a1" "a2" "a3" "a4" "a5" "a6" "a7" "a8" "a9"
## [1] "a"
Enjoy! Any comments and suggestions are welcome.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.