Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The post Create groups based on the lowest and highest values in R? appeared first on finnstats.
We encourage that you read this article from finnstats to stay up to date.
Create groups based on the lowest and highest values in R, to divide an input vector into n buckets, use the ntile() function in the R dplyr package.
The basic syntax used by this function is as follows.
ntile(x, n)
where:
x: Input vector
n: Number of buckets
Note: The bucket sizes might vary by up to one.
Create groups based on the lowest and highest values in R
The practical application of this function is demonstrated in the examples that follow.
Example 1: Use ntile() with a Vector
The ntile() function can be used to divide a vector of 11 elements into 5 groups using the following code.
library(dplyr)
Let’s create a vector
x <- c(10, 13, 14, 26, 27, 18, 11, 12, 15, 20, 13) x [1] 10 13 14 26 27 18 11 12 15 20 13
and divide the vector into five buckets.
ntile(x, 5) [1] 1 2 3 5 5 4 1 1 3 4 2
We can see from the result that each component of the original vector has been assigned to one of five bins.
The bucket with the fewest values is number 1, while the bucket with the biggest values is number 5.
For instance:
Bucket 1 is given the 10, 11, and 12 values with the lowest values.
The bucket with the highest values of 26 and 27 is number 5.
Example 2: Use ntile() with a Data Frame
Consider the following R data frame, which displays the points scored by different basketball players:
Let’s create a data frame
df <- data.frame(player=LETTERS[1:9], points=c(102, 109, 57, 122, 824, 528, 125, 159, 195))
Now we can view the data frame
df player points 1 A 102 2 B 109 3 C 57 4 D 122 5 E 824 6 F 528 7 G 125 8 H 159 9 I 195
The following code demonstrates how to add a new column to the data frame using the ntile() function that places each player into one of three buckets based on their total number of points.
add a new column that sorts players according to their point totals.
df$bucket <- ntile(df$points, 3)
Let’s view the updated data frame
df player points bucket 1 A 102 1 2 B 109 1 3 C 57 1 4 D 122 2 5 E 824 3 6 F 528 3 7 G 125 2 8 H 159 2 9 I 195 3
Each player is given a value between 1 and 3 in the new bucket column.
Players who have the fewest points are assigned a value of 1, while those who have the most points are assigned a value of 3.
To read more visit Create groups based on the lowest and highest values in R?.
If you are interested to learn more about data science, you can find more articles here finnstats.
The post Create groups based on the lowest and highest values in R? appeared first on finnstats.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.