Site icon R-bloggers

How to Check if a Column Contains a String in R

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< section id="introduction" class="level1">

Introduction

Whether you’re doing some data cleaning or exploring your dataset, checking if a column contains a specific string can be a crucial task. Today, I’ll show you how to do this using both str_detect() from the stringr package and base R methods. We’ll also tackle finding partial strings and counting occurrences. Let’s dive right in!

< section id="using-str_detect-from-stringr" class="level1">

Using str_detect from stringr

First, we’ll use the str_detect function. The stringr package is part of the tidyverse collection, which brings a set of user-friendly functions to text manipulation. We’ll start by ensuring it’s installed and loaded:

install.packages("stringr")

Now, let’s create a sample dataset:

library(stringr)
# Sample data
data <- data.frame(
  name = c("Alice", "Bob", "Carol", "Dave", "Eve"),
  description = c("Software developer", "Data analyst", "UX designer", "Project manager", "Data scientist")
)
data
   name        description
1 Alice Software developer
2   Bob       Data analyst
3 Carol        UX designer
4  Dave    Project manager
5   Eve     Data scientist
< section id="examples" class="level1">

Examples

< section id="using-stringr" class="level2">

Using stringr

< section id="check-for-full-string" class="level3">

Check for Full String

Suppose we want to check if any of the description column contains “Data analyst”:

# Detect if 'description' contains 'Data analyst'
data$has_data_analyst <- str_detect(data$description, "Data analyst")
print(data)
   name        description has_data_analyst
1 Alice Software developer            FALSE
2   Bob       Data analyst             TRUE
3 Carol        UX designer            FALSE
4  Dave    Project manager            FALSE
5   Eve     Data scientist            FALSE

In the output, the has_data_analyst column will be TRUE for “Bob” and FALSE for others.

< section id="check-for-partial-string" class="level3">

Check for Partial String

Let’s expand our search to any string containing “Data”:

# Detect if 'description' contains any word with 'Data'
data$has_data <- str_detect(data$description, "Data")
print(data)
   name        description has_data_analyst has_data
1 Alice Software developer            FALSE    FALSE
2   Bob       Data analyst             TRUE     TRUE
3 Carol        UX designer            FALSE    FALSE
4  Dave    Project manager            FALSE    FALSE
5   Eve     Data scientist            FALSE     TRUE

This will show TRUE for “Bob” and “Eve,” where both “Data analyst” and “Data scientist” are detected.

< section id="count-occurrences" class="level3">

Count Occurrences

If you need to count how many times “Data” appears, use str_count:

# Count occurrences of 'Data'
data$data_count <- str_count(data$description, "Data")
print(data)
   name        description has_data_analyst has_data data_count
1 Alice Software developer            FALSE    FALSE          0
2   Bob       Data analyst             TRUE     TRUE          1
3 Carol        UX designer            FALSE    FALSE          0
4  Dave    Project manager            FALSE    FALSE          0
5   Eve     Data scientist            FALSE     TRUE          1

This will add a column data_count with the exact count of occurrences per row.

< section id="using-base-r" class="level2">

Using Base R

For those who prefer base R, the grepl and gregexpr functions can help.

< section id="check-for-full-or-partial-string" class="level3">

Check for Full or Partial String

grepl is ideal for checking if a string is present:

# Using grepl for full/partial string detection
data$has_data_grepl <- grepl("Data", data$description)
print(data)
   name        description has_data_analyst has_data data_count has_data_grepl
1 Alice Software developer            FALSE    FALSE          0          FALSE
2   Bob       Data analyst             TRUE     TRUE          1           TRUE
3 Carol        UX designer            FALSE    FALSE          0          FALSE
4  Dave    Project manager            FALSE    FALSE          0          FALSE
5   Eve     Data scientist            FALSE     TRUE          1           TRUE

This will yield the same output as str_detect.

< section id="count-occurrences-1" class="level3">

Count Occurrences

For counting occurrences, gregexpr is helpful:

# Count occurrences using gregexpr
matches <- gregexpr("Data", data$description)
data$data_count_base <- sapply(
  matches, 
  function(x) ifelse(x[1] == -1, 0, length(x))
  )
print(data)
   name        description has_data_analyst has_data data_count has_data_grepl
1 Alice Software developer            FALSE    FALSE          0          FALSE
2   Bob       Data analyst             TRUE     TRUE          1           TRUE
3 Carol        UX designer            FALSE    FALSE          0          FALSE
4  Dave    Project manager            FALSE    FALSE          0          FALSE
5   Eve     Data scientist            FALSE     TRUE          1           TRUE
  data_count_base
1               0
2               1
3               0
4               0
5               1

This will add a new data_count_base column containing the count of “Data” in each row.

< section id="give-it-a-try" class="level1">

Give It a Try!

The best way to master string detection in R is to experiment with different patterns and datasets. Whether you use str_detect, grepl, or any other approach, you’ll find plenty of ways to customize the search. Try it out with your own datasets, and soon you’ll be searching like a pro!

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version