To get the length of a text string (i.e. the number of characters in the string):
Using
length()
would just give you the length of the vector containing the string, which will be 1 if the string is just a single string.
To get the position of a regular expression match(es) in a text string x:
pos = regexpr ( 'pattern' , x) # Returns position of 1st match in a string
pos = gregexpr ( 'pattern' , x) # Returns positions of every match in a string
|
To get the position of a regular expression match in a
vector x of text strings (this returns the
index of the matching string in the vector,
not the position of the match in the text string itself):
To extract part of a text string based on position in the text string, where
first and
last are the locations in the text string, usually found by the
regexpr()
function:
keep = substr (x, first, last)
|
To replace part of a text string with some other text:
sub ( 'pattern' , replacement, input) # Changes only the 1st pattern match per string
gsub ( 'pattern' , replacement, input) # Changes every occurrence of a pattern match
|
The pattern argument in the various regular expression functions can include include regular expressions enclosed in square brackets. See
?regex
for the explanation of regular expressions. For example, to make a pattern that matches any numerical digit, you could use
'[0-9]'
as the pattern argument. You may also use several predefined patterns such as
'[:digit:]'
, which also finds any numerical digit in the string, same as the
[0-9]
pattern.
File name stuff
To get a list of file names (and paths) in a directory:
fnames = dir ( "./path/to/my/data" , full.names= TRUE )
|
To extract just the filename from a full path:
To extract the directory path from a file path:
directory = dirname (path)
|
If you have a text string assigned to a variable in the R workspace, and you want to parse it using various other functions, you can use the
textConnection()
function to feed your string to the other function.
mydataframe = read.csv ( textConnection (myString)) # If myString contained comma-separated-values, this would convert them to a data frame.
|
Related
In the past few months, I've developed a set of functions for automating model estimation and interpretation using Mplus, an outstanding latent variable modeling program that has unparalleled flexibility for complex models (e.g., factor mixture models). I recently rolled these functions into an R package called MplusAutomation. Because the package…
May 4, 2010
Similar post
When working with strings regular expressions are an extremely powerful tool to look for specific patterns in the strings. In informatics a string is several characters put together, this can be words, sentences, or DNA code. Regular expression were developed for the language of Perl (http://www.perl.org/) and have been since…
June 1, 2014
In "R bloggers"
Introduction Regular expressions, often abbreviated as regex, are powerful tools used in programming to match and manipulate text patterns. While they might seem intimidating at first, regular expressions are incredibly useful for tasks like dat...
May 30, 2023
In "R bloggers"