Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
If you’ve worked in a spreadsheet application before, you’re likely familiar with the “text-to-columns” tool. This tool allows you to split one column of data into multiple columns based on a delimiter. This same functionality is also achievable in R through functions such as the “separate” function from the “tidyr” library.
To test this function out, let’s first require the “tidyr” library and then create a test dataframe for us to use.
library(tidyr) df <- data.frame(person = c("John_Doe", "Jane_Doe"))
We now have a dataframe with one column which contains a first name and a last name combined by an underscore. Let’s now split the two names into their own separate columns.
df <- df %>% separate(person, c("first_name", "last_name"), "_")
Let’s break down what just happened. We first declared that “df” was going to be equal to the output of the function that followed by typing “df <-”. Next we told the separate function that it would be altering the existing dataframe called “df” by typing “df %>%”.
We then gave the separate function three arguments. The first argument was the column we were going to be editing, “person”. The second argument was the names of our two new columns, “first_name” and “last_name”. Finally, the third argument was our desired delimiter, “_”.
Splitting Text in R was originally published in Trevor French on Medium, where people are continuing the conversation by highlighting and responding to this story.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.