[This article was first published on R – The Hack-R Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I have (sometimes incomplete) data on addresses that looks like this:
data <- c("1600 Pennsylvania Avenue, Washington DC",
",Siem Reap,FC,", "11 Wall Street, New York, NY", ",Addis Ababa,FC,")
where I need to remove the first and/or last character if either one of them are a comma.
Avinash Raj was able to help me with this on S.O. and the question turned out to be a popular one, so I’ll show the solution here:
> data <- c("1600 Pennsylvania Avenue, Washington DC",
+ ",Siem Reap,FC,", "11 Wall Street, New York, NY", ",Addis Ababa,FC,")
> gsub("(?<=^),|,(?=$)", "", data, perl=TRUE)
[1] "1600 Pennsylvania Avenue, Washington DC"
[2] "Siem Reap,FC"
[3] "11 Wall Street, New York, NY"
[4] "Addis Ababa,FC"
Pattern explanation:
(?<=^),In regex(?<=)called positive look-behind. In our case it asserts What precedes the comma must be a line start^. So it matches the starting comma.|Logical OR operator usually used to combine(ie, ORing) two regexes.,(?=$)Lookahead aseerts that what follows comma must be a line end$. So it matches the comma present at the line end.
To leave a comment for the author, please follow the link and comment on their blog: R – The Hack-R Blog.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
