Site icon R-bloggers

How to split vector and data frame in R

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

finnstats:-For the latest Data Science, jobs and UpToDate tutorials visit finnstats

Split vector and data frame in R, splitting data into groups depending on factor levels can be done with R’s split() function.

Split() is a built-in R function that divides a vector or data frame into groups according to the function’s parameters. It takes a vector or data frame as an argument and divides the information into groups.

Time Series Trend Analysis in R » finnstats

The syntax for this function is as follows:

split(x, f, drop = FALSE, ...)
split(x, f, drop = FALSE, sep = ".", lex.order = FALSE, ...)

where:

x: Name of the data frame or vector to be divided into groups

f: A criterion used to classify people into groups.

In R, the unsplit() function reverses the split() function. The split() function returns a list of vectors holding the values of the groups.

The examples below demonstrate how to divide vectors and data frames into groups using this method.

Example 1: To divide a vector into groups, use the split() function.

The code below demonstrates how to divide a vector of data values into groups using a vector of factor levels.

Let’s create some vector of data values for an illustration

data <- c(5, 6, 8, 2, 1, 2, 18, 19)

Now we can define a vector of groupings

groups <- c('A', 'A', 'A', 'B', 'C', 'C'',C', 'C')

Yes, It’s ready to split vector of data values into groups

split(x = data, f = groups)
$A
[1] 5 6 8
$B
[1] 2
$C
[1]  1  2 18 19

The vector data split into three groups.

Intro to Tensorflow-Machine Learning with TensorFlow » finnstats

It’s worth noting that indexing can also be used to retrieve certain groups.

split the data values vector into groups and only show the second group

$B
[1] 2

Example 2: Split a Data Frame Into Groups with split().

Let’s imagine we have the following R data frame.

create a df data frame for illustration purpose

df <- data.frame(Product=c('X', 'X', 'Y', 'Y', 'Y', 'Z'),
                 Condition=c('T', 'T', 'F', 'F', 'T', 'F'),
                 Score=c(303, 128, 341, 319, 54, 74),
                 Quality=c(38, 27, 224, 228, 32, 41))

Let’s view the data frame

df
  Product Condition Score Quality
1       X         T   303      38
2       X         T   128      27
3       Y         F   341     224
4       Y         F   319     228
5       Y         T    54      32
6       Z         F    74      41

To divide the data frame into groups based on the ‘product’ variable, we can use the following code:

Let’s split the data frame into groups based on ‘product’

split(df, f = df$Product)
$X
  Product Condition Score Quality
1       X         T   303      38
2       X         T   128      27

$Y
  Product Condition Score Quality
3       Y         F   341     224
4       Y         F   319     228
5       Y         T    54      32

$Z
  Product Condition Score Quality
6       Z         F    74      41

As a result, there are three groupings. The first has only rows where ‘product’ equals X, the second has only rows where ‘product’ equals Y, and the third has only rows where ‘product’ equals Z.

It’s worth mentioning that the data are able to divide into groups using a variety of factor variables.

For example, the following code shows how to divide data into groups based on the ‘product’ and ‘condition’ variables.

Using the ‘product’ and ‘condition’ variables, divide the data frame into groups.

Sentiment analysis in R » Complete Tutorial » finnstats

split(df, f = list(df$Product, df$Condition))
$X.F
[1] Product   Condition Score     Quality 
<0 rows> (or 0-length row.names)

$Y.F
  Product Condition Score Quality
3       Y         F   341     224
4       Y         F   319     228

$Z.F
  Product Condition Score Quality
6       Z         F    74      41

$X.T
  Product Condition Score Quality
1       X         T   303      38
2       X         T   128      27

$Y.T
  Product Condition Score Quality
5       Y         T    54      32

$Z.T
[1] Product   Condition Score     Quality 
<0 rows> (or 0-length row.names)

Use the unsplit() function to restore the original data frame from the split() method. The unsplit() method has the following syntax.

unsplit(df, f = df$Product)

Conclusion

Use the split() function in R to split a vector or data frame. Use the unsplit() method to retrieve the split vector or data frame.

Likelihood Ratio Test in R with Example »

The post How to split vector and data frame in R appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Methods – finnstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.