How to Extract String After a Specific Character in R

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Welcome back, R Programmers! Today, we’ll explore a common task: extracting a substring after a specific character in R. Whether you’re cleaning data or transforming strings, this skill is quite handy. We’ll look at three approaches: using base R, stringr, and stringi. Let’s dive in!

Examples

Using Base R

Base R provides several functions to manipulate strings. Here, we’ll use sub and strsplit to extract a substring after a specific character.

Example 1: Using sub

The sub function allows us to replace parts of a string based on a pattern. Here’s how to extract the part after a specific character, say a hyphen (-).

# Example string
string <- "data-science"

# Extract substring after the hyphen
result <- sub(".*-", "", string)
print(result)  # Output: "science"
[1] "science"

Explanation:

  • .*- is a regular expression where .* matches any character (except for line terminators) zero or more times, and - matches the hyphen.
  • "" is the replacement, effectively removing everything up to and including the hyphen.

Example 2: Using strsplit

The strsplit function splits a string into substrings based on a delimiter.

# Example string
string <- "hello-world"

# Split the string at the hyphen
parts <- strsplit(string, "-")[[1]]

# Extract the part after the hyphen
result <- parts[2]
print(result)  # Output: "world"
[1] "world"

Explanation:

  • strsplit(string, "-") splits the string into parts at the hyphen, returning a list.
  • [[1]] extracts the first element of the list.
  • [2] extracts the second part of the split string.

Using stringr

The stringr package, part of the tidyverse, provides consistent and easy-to-use string functions.

Example 1: Using str_extract

The str_extract function extracts matching patterns from a string.

library(stringr)

# Example string
string <- "apple-pie"

# Extract substring after the hyphen
result <- str_extract(string, "(?<=-).*")
print(result)  # Output: "pie"
[1] "pie"

Explanation:

  • (?<=-) is a look behind assertion, ensuring the match occurs after a hyphen.
  • .* matches any character zero or more times.

Example 2: Using str_split

Similar to strsplit in base R, str_split splits a string based on a pattern.

# Example string
string <- "open-source"

# Split the string at the hyphen
parts <- str_split(string, "-")[[1]]

# Extract the part after the hyphen
result <- parts[2]
print(result)  # Output: "source"
[1] "source"

Explanation:

  • str_split(string, "-") splits the string into parts at the hyphen, returning a list.
  • [[1]] extracts the first element of the list.
  • [2] extracts the second part of the split string.

Using stringi

The stringi package is another powerful tool for string manipulation, providing high-performance functions.

Example 1: Using stri_extract

The stri_extract function extracts substrings based on patterns.

library(stringi)

# Example string
string <- "front-end"

# Extract substring after the hyphen
result <- stri_extract(string, regex = "(?<=-).*")
print(result)  # Output: "end"
[1] "end"

Explanation:

  • regex = "(?<=-).*" uses a regular expression where (?<=-) is a lookbehind assertion ensuring the match occurs after a hyphen, and .* matches any character zero or more times.

Example 2: Using stri_split

Similar to strsplit and str_split, stri_split splits a string based on a pattern.

# Example string
string <- "full-stack"

# Split the string at the hyphen
parts <- stri_split(string, regex = "-")[[1]]

# Extract the part after the hyphen
result <- parts[2]
print(result)  # Output: "stack"
[1] "stack"

Explanation:

  • stri_split(string, regex = "-") splits the string into parts at the hyphen, returning a list.
  • [[1]] extracts the first element of the list.
  • [2] extracts the second part of the split string.

Conclusion

There you have it—three different ways to extract a substring after a specific character in R. Each method has its own benefits and can be handy depending on your specific needs. Give these examples a try and see which one works best for your data!


Happy coding!

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)