[This article was first published on The Data Monkey, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A regular expression allows you to do a moderately fancy search (and replace if you want). So say you wanted to replace all the “Dennis”s in a variable with “Awesome”s, but only if they’re at the end of the line. You could try:Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
-replace PBFnamevar = regexr(PBFnamevar,”Dennis$”,”Awesome”)-
You could also replace any character, or just capitals, or just digits…there are lots of possibilities:
http://www.stata.com/support/faqs/data/regex.html
You can also use it for locals:
-local strata = regexr(“agecat”,”age”)-
Or -if- commands:
if regexm(“`strata'”,”age”) {
}
On a related note (although not actually regular expressions), say that you’ve got a string variable that consists of a bunch of what should be separate variables, only lumped all into one, separated by a semicolon (e.g. a row might look like “1;15.2;89;hi;21”). Try -split-:
-split textvar, gen(newtextvars) parse(“;”)-
I should note that Stata’s regular expressions are wimpy compared to what other languages support. R supports PERL regular expressions, which can do so many things it’s scary.
To leave a comment for the author, please follow the link and comment on their blog: The Data Monkey.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.