Site icon R-bloggers

Stringr in r 10 data manipulation Tips and Tricks

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Stringr in r data manipulation Tips and Tricks, In this tutorial we are going to discuss useful functions and expressions in stringr package.

Stringr in r

Variety of functions available in stringr package but we are going cover only important functions in our day-to-day data analysis.

library(stringr)

1. Word Length

statement<-c("R", "is powerful", "tool", "for data", "analysis")

Suppose if you want to find the length of each word, you can use str_length

statement
"R"           "is powerful" "tool"        "for data"    "analysis"
str_length(statement)
1 11  4  8  8

2. Concatnate

If you want to join the string str_c will be useful.

Market Basket Analysis in R » What Goes With What »

str_c(statement,collapse=" ")

“R is powerful tool for data analysis”

str_c("test",1:10, sep="-")[1] "test-1"  "test-2"  "test-3"  "test-4"  "test-5"  "test-6"  "test-7"  "test-8"  "test-9"
[10] "test-10"
str_c("test",1:10, sep=",")
[1] "test,1"  "test,2"  "test,3"  "test,4"  "test,5"  "test,6"  "test,7"  "test,8"  "test,9"
[10] "test,10"

3. NA Replace

Now will see how to handle missing data’s

str_c(c("My Name", NA, "Jhon"),".")
"My Name." NA         "Jhon."

So you can see missing value is not concatenated. This we can overcome based on str_replace_na()

replace NA with . or character

str_replace_na(c("My Name", NA, "Jhon"),".")
"My Name" "."       "Jhon"

4. String Extraction

If you want to extract the substring then str_sub will be handy.

str_sub(statement,1,5)
"R"     "is po" "tool"  "for d" "analy"

Now you can see the first 5 characters extracted from the string vector.

Decision Trees in R » Classification & Regression »

If you know the length of the string you can update your string also.

str_sub(statement, 4,-1)<-"Wow"
statement
"RWow"   "is Wow" "tooWow" "forWow" "anaWow"

5. Split

If you want to split the string based on pattern, str_split will be useful.

str_split(statement,pattern=" ")
[[1]]
[1] "RWow"
[[2]]
[1] "is"  "Wow"
[[3]]
[1] "tooWow"
[[4]]
[1] "forWow"
[[5]]
[1] "anaWow"

6. Subset

Suppose if you want to subset word in the particular pattern you can make use of str_subset

Handling missing values in R Programming »

str_subset(colors(),pattern="green")
[1] "darkgreen"         "darkolivegreen"    "darkolivegreen1"   "darkolivegreen2" 
 [5] "darkolivegreen3"   "darkolivegreen4"   "darkseagreen"      "darkseagreen1"   
 [9] "darkseagreen2"     "darkseagreen3"     "darkseagreen4"     "forestgreen"     
[13] "green"             "green1"            "green2"            "green3"          
[17] "green4"            "greenyellow"       "lawngreen"         "lightgreen"      
[21] "lightseagreen"     "limegreen"         "mediumseagreen"    "mediumspringgreen"
[25] "palegreen"         "palegreen1"        "palegreen2"        "palegreen3"      
[29] "palegreen4"        "seagreen"          "seagreen1"         "seagreen2"       
[33] "seagreen3"         "seagreen4"         "springgreen"       "springgreen1"    
[37] "springgreen2"      "springgreen3"      "springgreen4"      "yellowgreen"
If  you want to extract colors start with orange or end with red then ^$ will be helpful
str_subset(colors(),pattern="^orange|red$")
1] "darkred"         "indianred"       "mediumvioletred" "orange"          "orange1"      
 [6] "orange2"         "orange3"         "orange4"         "orangered"       "orangered1"    
[11] "orangered2"      "orangered3"      "orangered4"      "palevioletred"   "red"           
[16] "violetred"

^ indicate the starting of the string and $ indicate string ending with

If you want to extract characters or numbers from string str_extract will be useful

list<-c("Hai1", "my 10", "Name 20")
str_extract(list,pattern="[a-z]")

“a” “m” “a”

If you want full word then you can use

str_extract(list,pattern="[a-z]+")

“ai”  “my”  “ame”

7. html view

If you want to see the html vie output then you can use str_view

Stock Prediction-Intraday Trading » With High Accuracy »

str_view(statement,"a.")

Return first match in first string

9. Count

str_count for counting the character

str_count(statement,"[ae]")
0 0 0 0 2

9. Location

str_locate(statement,"[ae]")
start end
[1,]    NA  NA
[2,]    NA  NA
[3,]    NA  NA
[4,]    NA  NA
[5,]     1   1

str_locate will display the first match

Filtering Data in R 10 Tips -tidyverse package »

10. Lower/Upper case

For lower case letters

str_to_lower(statement)

“rwow”   “is wow” “toowow” “forwow” “anawow”

str_to_upper(statement)

“RWOW”   “IS WOW” “TOOWOW” “FORWOW” “ANAWOW”

For case, the sensitive first letter in upper case and rest will be lower case

apply family in r apply(), lapply(), sapply(), mapply() and tapply()

str_to_title(statement)

“Rwow”   “Is Wow” “Toowow” “Forwow” “Anawow”

?stringr and go to index you will get all stringr functions.

Conclusion

Many other functions are available in stringr package. If you feel any other important function we missed out please mention it in the comment box.

Latest Data Science Job Vacancies

The post Stringr in r 10 data manipulation Tips and Tricks appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Methods – finnstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.