stringfix : new R package for string manipulation in a %>% way

[This article was first published on Guillaume Pressiat, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I usually write around here in french and mainly report on French Hospitals data managment and the statistical tasks they imply. As today’s post is about a new package I have created, I’ll be writing in english. The package is called stringfix because it uses infix operators to manipulate character strings.

This post is an actualisation on December 2018 post.

Introduction

In Python, the operator + is used to paste two character strings together. For example: 'Hello ' + 'world' gives 'Hello World'. For that matter, building sentences with words and arithmetic symbols seems a very nice way to write. In R, the paste function requires parenthesis in order to be computed. Therefore the use of consecutive functions can make it hard to understand.

+ is a nice operator, and we can use it in R almost as it is used in Python by creating an infix operator.

`%+%` <- function(x,y){
  paste0(x, y)
}

While a ggplot function has already the same name, it is used to override data in a ggplot call and not for pasting character strings, see here. When loading tidyverse, the same ggplot function is called, preventing us from using paste0’s %+%. Otherwise, you can find a hint of character string pasting in the Advanced R book.

In order to create a toolbox around paste0’s %+%, I started collecting some other infix functions for character strings manipulation. The main question was: which functions with a right to left call that I use really often could be reordered in a %>% code. Here is the little family I have since build on : paste, grepl, substring, count, padding. The goal of this package is to use stringr or base functions in backend as a start for an alternative character string manipulation in R.

This package is still at its early begining (kind of a draft for me!) but I thought some other people would enjoy it and may even wish to contribute.

Presentation

"In a manner of coding, I just want to say..." % % "Nothing."
#>[1] "In a manner of coding, I just want to say... Nothing."

Examples

paste

'Hello ' %+% 'world'
#> [1] "Hello world"
'Your pastas taste like ' %+% '%>%'
#> [1] "Your pastas taste like %>%"
'coco' %+% 'bolo'
#> [1] "cocobolo"

'Hello' % % 'world'
#> [1] "Hello world"
'Your pastas taste like' % % '%>%'
#> [1] "Your pastas taste like %>%"
'Hello' %,% 'world...'
#> [1] "Hello, world..."
'Your pastas taste like ' %+% '%>%...' %,% 'or %>>%...'
#> [1] "Your pastas taste like %>%..., or %>>%..."

grepl

Case sensitive
'pig' %g% 'The pig is in the cornfield'
#> [1] TRUE
'Pig' %g% 'The pig is in the cornfield'
#> [1] FALSE
Case insensitive (ignore.case)
'pig' %gic% 'The pig is in the cornfield'
#> [1] TRUE
'PIG' %gic% 'The PiG is in the cornfield'
#> [1] TRUE

substring

'NFKA008' %s% '1.4'
#> [1] "NFKA"
'NFKA008' %s% .4
#> [1] "NFKA"
'where is' % % ('the pig is in the cornfield' %s% '1.7') %+% '?'
#> [1] "where is the pig?"

count

fruit <- c("apple", "banana", "pear", "pineapple")
"a" %count% fruit
#> [1] 1 3 1 1
c("a", "b", "p", "p") %count% fruit
#> [1] 1 1 1 3

pad, lpad and rpad

5 %lpad% '0.5'
#> [1] "00005"
5 %lpad%   .5
#> [1] "00005"
5 %lpad%  '.5'
#> [1] "    5"
5 %lpad% '2.5'
#> [1] "22225"
'é' %lpad% 'é.5'
#> [1] "ééééé"

names of tibbles : tolower and toupper

I have added two functions that I use really often.

library(magrittr)
iris %>% toupper_names %>% head
#>   SEPAL.LENGTH SEPAL.WIDTH PETAL.LENGTH PETAL.WIDTH SPECIES
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

Finally, I also wanted to outline that the function from the rmngb package : %out% : negation of %in% can be very useful to avoid typing ! x %in% y (you can just type x %out% y instead). This is why I have included it in this package !

More functions and information here : https://github.com/GuillaumePressiat/stringfix


To leave a comment for the author, please follow the link and comment on their blog: Guillaume Pressiat.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)