Site icon R-bloggers

Basic text string functions in R

[This article was first published on lukemiller.org » R-project, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
  To get the length of a text string (i.e. the number of characters in the string):
nchar()
Using length() would just give you the length of the vector containing the string, which will be 1 if the string is just a single string. To get the position of a regular expression match(es) in a text string x:
pos = regexpr('pattern', x) # Returns position of 1st match in a string
pos = gregexpr('pattern', x) # Returns positions of every match in a string
To get the position of a regular expression match in a vector x of text strings (this returns the index of the matching string in the vector, not the position of the match in the text string itself):
pos = grep('pattern', x)
To extract part of a text string based on position in the text string, where first and last are the locations in the text string, usually found by the regexpr() function:
keep = substr(x, first, last)
To replace part of a text string with some other text:
sub('pattern', replacement, input) # Changes only the 1st pattern match per string
gsub('pattern', replacement, input) # Changes every occurrence of a pattern match
The pattern argument in the various regular expression functions can include include regular expressions enclosed in square brackets. See ?regex for the explanation of regular expressions. For example, to make a pattern that matches any numerical digit, you could use '[0-9]' as the pattern argument. You may also use several predefined patterns such as '[:digit:]', which also finds any numerical digit in the string, same as the [0-9] pattern.

File name stuff

To get a list of file names (and paths) in a directory:
fnames = dir("./path/to/my/data", full.names=TRUE)
To extract just the filename from a full path:
fname = basename(path)
To extract the directory path from a file path:
directory = dirname(path)
If you have a text string assigned to a variable in the R workspace, and you want to parse it using various other functions, you can use the textConnection() function to feed your string to the other function.
mydataframe = read.csv(textConnection(myString)) # If myString contained comma-separated-values, this would convert them to a data frame.

To leave a comment for the author, please follow the link and comment on their blog: lukemiller.org » R-project.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.