stringi 0.4-1 released – fast, portable, consistent character string processing
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A new release of the stringi
package is available on CRAN (please wait a few days for Windows and OS X binary builds).
# install.packages("stringi") or update.packages() library("stringi")
Here’s a list of changes in version 0.4-1. In the current release, we particularly focused on making the package’s interface more consistent with that of the well-known stringr
package. For a general overview of stringi
’s facilities and base R string processing issues, see e.g. here.
-
(IMPORTANT CHANGE)
n_max
argument instri_split_*()
has been renamedn
. -
(IMPORTANT CHANGE)
simplify=FALSE
instri_extract_all_*()
andstri_split_*()
now callsstri_list2matrix()
withfill=""
.fill=NA_character_
may be obtained by usingsimplify=NA
. -
(IMPORTANT CHANGE, NEW FUNCTIONS) #120:
stri_extract_words
has been renamedstri_extract_all_words
andstri_locate_boundaries
–stri_locate_all_boundaries
as well asstri_locate_words
–stri_locate_all_words
. New functions are now available:stri_locate_first_boundaries
,stri_locate_last_boundaries
,stri_locate_first_words
,stri_locate_last_words
,stri_extract_first_words
,stri_extract_last_words
.
# uses ICU's locale-dependent word break iterator stri_extract_all_words("stringi: THE string processing package for R") ## [[1]] ## [1] "stringi" "THE" "string" "processing" "package" ## [6] "for" "R"
- (IMPORTANT CHANGE) #111:
opts_regex
,opts_collator
,opts_fixed
, andopts_brkiter
can now be supplied individually via...
. In other words, you may now simply call e.g.
stri_detect_regex(c("stringi", "STRINGI"), "stringi", case_insensitive=TRUE) ## [1] TRUE TRUE
instead of:
stri_detect_regex(c("stringi", "STRINGI"), "stringi", opts_regex=stri_opts_regex(case_insensitive=TRUE)) ## [1] TRUE TRUE
-
(NEW FEATURE) #110: Fixed pattern search engine’s settings can now be supplied via
opts_fixed
argument instri_*_fixed()
, seestri_opts_fixed()
. A simple (not suitable for natural language processing) yet very fastcase_insensitive
pattern matching can be performed now.stri_extract_*_fixed
is again available. -
(NEW FEATURE) #23:
stri_extract_all_fixed
,stri_count
, andstri_locate_all_fixed
may now also look for overlapping pattern matches, see?stri_opts_fixed
.
stri_extract_all_fixed("abaBAaba", "ABA", case_insensitive=TRUE, overlap=TRUE) ## [[1]] ## [1] "aba" "aBA" "aba"
-
(NEW FEATURE) #129:
stri_match_*_regex
gained acg_missing
argument. -
(NEW FEATURE) #117:
stri_extract_all_*()
,stri_locate_all_*()
,stri_match_all_*()
gained a new argument:omit_no_match
. Setting it toTRUE
makes these functions compatible with theirstringr
equivalents. -
(NEW FEATURE) #118:
stri_wrap()
gainedindent
,exdent
,initial
, andprefix
arguments. Moreover Knuth’s dynamic word wrapping algorithm now assumes that the cost of printing the last line is zero, see #128.
cat(stri_wrap(stri_rand_lipsum(1), 40, 2.0), sep="n") ## Lorem ipsum dolor sit amet, et et diam ## vitae est ut. At tristique, tincidunt ## taciti, ac egestas vestibulum magna. ## Volutpat nisl non sed ultricies nisl ## nibh magna. Nullam rhoncus ut phasellus ## sed. Congue enim libero congue massa ## eget. Ligula, quis est amet velit. ## Accumsan amet nunc ad. Porttitor, ## sed vestibulum diam vestibulum quis ## sed gravida ultrices. Per urna enim. ## Scelerisque interdum sed vestibulum ## rhoncus quis imperdiet pharetra. Sapien ## iaculis, lacinia ac cras ante, sed ## vitae inceptos dis tristique dignissim. ## Venenatis volutpat lectus sodales, ## hac feugiat molestie mollis. A, urna ## pellentesque ante himenaeos ante at ## potenti in.
- (NEW FEATURE) #122:
stri_subset()
gained anomit_na
argument.
stri_subset_fixed(c("abc", NA, "def"), "a") ## [1] "abc" NA stri_subset_fixed(c("abc", NA, "def"), "a", omit_na=TRUE) ## [1] "abc"
-
(NEW FEATURE)
stri_list2matrix()
gained ann_min
argument. -
(NEW FEATURE) #126:
stri_split()
now is also able to act just likestringr::str_split_fixed()
.
stri_split_regex(c("bab", "babab"), "a", n = 3, simplify=TRUE) ## [,1] [,2] [,3] ## [1,] "b" "b" "" ## [2,] "b" "b" "b"
-
(NEW FEATURE) #119:
stri_split_boundaries()
now haven
,tokens_only
, andsimplify
arguments. Additionally,stri_extract_all_words()
is now equipped withsimplify
arg. -
(NEW FEATURE) #116:
stri_paste()
gained a new argument:ignore_null
. Setting it toTRUE
makes this function more compatible withpaste()
.
for (test in c(TRUE, FALSE)) print(stri_paste("a", if (test) 1:9, ignore_null=TRUE)) ## [1] "a1" "a2" "a3" "a4" "a5" "a6" "a7" "a8" "a9" ## [1] "a"
Enjoy! Any comments and suggestions are welcome.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.