To get the length of a text string (i.e. the number of characters in the string):
Using
length()
would just give you the length of the vector containing the string, which will be 1 if the string is just a single string.
To get the position of a regular expression match(es) in a text string x:
pos = regexpr ( 'pattern' , x) # Returns position of 1st match in a string
pos = gregexpr ( 'pattern' , x) # Returns positions of every match in a string
|
To get the position of a regular expression match in a
vector x of text strings (this returns the
index of the matching string in the vector,
not the position of the match in the text string itself):
To extract part of a text string based on position in the text string, where
first and
last are the locations in the text string, usually found by the
regexpr()
function:
keep = substr (x, first, last)
|
To replace part of a text string with some other text:
sub ( 'pattern' , replacement, input) # Changes only the 1st pattern match per string
gsub ( 'pattern' , replacement, input) # Changes every occurrence of a pattern match
|
The pattern argument in the various regular expression functions can include include regular expressions enclosed in square brackets. See
?regex
for the explanation of regular expressions. For example, to make a pattern that matches any numerical digit, you could use
'[0-9]'
as the pattern argument. You may also use several predefined patterns such as
'[:digit:]'
, which also finds any numerical digit in the string, same as the
[0-9]
pattern.
File name stuff
To get a list of file names (and paths) in a directory:
fnames = dir ( "./path/to/my/data" , full.names= TRUE )
|
To extract just the filename from a full path:
To extract the directory path from a file path:
directory = dirname (path)
|
If you have a text string assigned to a variable in the R workspace, and you want to parse it using various other functions, you can use the
textConnection()
function to feed your string to the other function.
mydataframe = read.csv ( textConnection (myString)) # If myString contained comma-separated-values, this would convert them to a data frame.
|