Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A very common task in data processing is the transformation of the numeric variables (continuous, discrete etc) to categorical by creating bins. For example, is quite ofter to convert the age
to the age group
. Let’s see how we can easily do that in R.
We will consider a random variable from the Poisson distribution with parameter λ=20
library(dplyr) # Generate 1000 observations from the Poisson distribution # with lambda equal to 20 df<-data.frame(MyContinuous = rpois(1000,20)) # get the histogtam hist(df$MyContinuous)
Create specific Bins
Let’s say that you want to create the following bins:
- Bin 1: (-inf, 15]
- Bin 2: (15,25]
- Bin 3: (25, inf)
We can easily do that using the cut
command. Let’s start:
df<-df%>%mutate(MySpecificBins = cut(MyContinuous, breaks = c(-Inf,15,25,Inf))) head(df,10)
Let’s have a look at the counts of each bin.
df%>%group_by(MySpecificBins)%>%count()
Notice that you can define also you own labels within the cut
function.
Create Bins based on Quantiles
Let’s say that you want each bin to have the same number of observations, like for example 4 bins of an equal number of observations, i.e. 25% each. We can easily do it as follows:
numbers_of_bins = 4 df<-df%>%mutate(MyQuantileBins = cut(MyContinuous, breaks = unique(quantile(MyContinuous,probs=seq.int(0,1, by=1/numbers_of_bins))), include.lowest=TRUE)) head(df,10)
We can check the MyQuantileBins
if contain the same number of observations, and also to look at their ranges:
df%>%group_by(MyQuantileBins)%>%count()
Notice that in case that you want to split your continuous variable into bins of equal size you can also use the ntile
function of the dplyr
package, but it does not create labels of the bins based on the ranges.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.