Site icon R-bloggers

A guide to working with character data in R

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R is primarily a language for working with numbers, but we often need to work with text as well. Whether it’s formatting text for reports, or analyzing natural language data, R provides a number of facilities for working with character data. Handling Strings with R, a free (CC-BY-NC-SA) e-book by UC Berkeley’s Gaston Sanchez, provides an overview of the ways you can manipulate characters and strings with R. 

There are many useful sections in the book, but a few selections include:

Note that the book does not cover analysis of natural language data, for which you might want to check out the CRAN Task View on Natural Language Processing or the book Text Mining with R: A Tidy Approach. It’s also sadly silent on the topic of character encoding in R, a topic that often causes problems when dealing with text data, especially from international sources. Nonetheless, the book is a really useful overview of working with text in R, and has been updated extensively since it was last published in 2014. You can read Handling Strings with R at the link below.

Gaston Sanchez: Handling Strings with R

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.