Site icon R-bloggers

Customer segmentation – LifeCycle Grids with R

[This article was first published on Analyze Core » R language, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I want to share a very powerful approach for customer segmentation in this post. It is based on customer’s lifecycle, specifically on frequency and recency of purchases. The idea of using these metrics comes from the RFM analysis. Recency and frequency are very important behavior metrics. We are interested in frequent and recent purchases, because frequency affects client’s lifetime value and recency affects retention. Therefore, these metrics can help us to understand the current phase of the client’s lifecycle. When we know each client’s phase, we can split customer base into groups (segments) in order to:

For this, we will use a matrix called LifeCycle Grids. We will study how to process initial data (transaction) to the matrix, how to visualize it, and how to do some in-depth analysis. We will do all these steps with the R programming language.

Let’s create a data sample with the following code:

# loading libraries
library(dplyr)
library(reshape2)
library(ggplot2)

# creating data sample
set.seed(10)
data <- data.frame(orderId=sample(c(1:1000), 5000, replace=TRUE),
product=sample(c('NULL','a','b','c'), 5000, replace=TRUE,
prob=c(0.15, 0.65, 0.3, 0.15)))
order <- data.frame(orderId=c(1:1000),
clientId=sample(c(1:300), 1000, replace=TRUE))
gender <- data.frame(clientId=c(1:300),
gender=sample(c('male', 'female'), 300, replace=TRUE, prob=c(0.40, 0.60)))
date <- data.frame(orderId=c(1:1000),
orderdate=sample((1:100), 1000, replace=TRUE))
orders <- merge(data, order, by='orderId')
orders <- merge(orders, gender, by='clientId')
orders <- merge(orders, date, by='orderId')
orders <- orders[orders$product!='NULL', ]
orders$orderdate <- as.Date(orders$orderdate, origin="2012-01-01")
rm(data, date, order, gender)

The head of our data sample looks like:

  orderId clientId product gender orderdate
1   1       254       a    female 2012-04-03
2   1       254       b    female 2012-04-03
3   1       254       c    female 2012-04-03
4   1       254       b    female 2012-04-03
5   2       151       a    female 2012-01-31
6   2       151       b    female 2012-01-31

You can see that there is a gender of customer in the table. We will use it as an example of some in-depth analysis later. I recommend you to use any additional features, that you have, for seeking insights. It can be source of client, channel, campaign, geo data and so on.

A few words about LifeCycle Grids. It is a matrix with 2 dimensions:

The first step is to think about suitable grids for your business. It is impossible to work with infinite segments. Therefore, we need to define some boundaries of frequency and recency, which should help us to split customers into homogeneous groups (segments). Analysis of distribution of frequency and recency values through our data set jointly with knowledge of business aspects can help us to find suitable boundaries.

Therefore, we need to calculate two values:

Then, plot the distribution with the following code:

# reporting date
today <- as.Date('2012-04-11', format='%Y-%m-%d')

# processing data
orders <- dcast(orders, orderId + clientId + gender + orderdate ~ product, value.var='product', fun.aggregate=length)

orders <- orders %>%
 group_by(clientId) %>%
 mutate(frequency=n(),
 recency=as.numeric(today-orderdate)) %>%
 filter(orderdate==max(orderdate))

# exploratory analysis
ggplot(orders, aes(x=frequency)) +
 theme_bw() +
 scale_x_continuous(breaks=c(1:10)) +
 geom_bar(alpha=0.6, binwidth=1) +
 ggtitle("Dustribution by frequency")

ggplot(orders, aes(x=recency)) +
 theme_bw() +
 geom_bar(alpha=0.6, binwidth=1) +
 ggtitle("Dustribution by recency")

Early behavior is most important, so finer detail is good there. Usually, there is a significant difference between customers who bought 1 time and those who bought 3 times, but is there any difference between customers who bought 50 times and other who bought 53 times? That is why it makes sense to set boundaries from lower values to higher gaps. We will use the following boundaries:

Next, we need to add segments to each client based on the boundaries. Also, we will create new variable ‘cart’, which includes products from the last cart, for doing in-depth analysis.

orders.segm <- orders %>%
 mutate(segm.freq=ifelse(between(frequency, 1, 1), '1',
 ifelse(between(frequency, 2, 2), '2',
 ifelse(between(frequency, 3, 3), '3',
 ifelse(between(frequency, 4, 4), '4',
 ifelse(between(frequency, 5, 5), '5', '>5')))))) %>%
 mutate(segm.rec=ifelse(between(recency, 0, 6), '0-6 days',
 ifelse(between(recency, 7, 13), '7-13 days',
 ifelse(between(recency, 14, 19), '14-19 days',
 ifelse(between(recency, 20, 45), '20-45 days',
 ifelse(between(recency, 46, 80), '46-80 days', '>80 days')))))) %>%
 # creating last cart feature
 mutate(cart=paste(ifelse(a!=0, 'a', ''),
 ifelse(b!=0, 'b', ''),
 ifelse(c!=0, 'c', ''), sep='')) %>%
 arrange(clientId)

# defining order of boundaries
orders.segm$segm.freq <- factor(orders.segm$segm.freq, levels=c('>5', '5', '4', '3', '2', '1'))
orders.segm$segm.rec <- factor(orders.segm$segm.rec, levels=c('>80 days', '46-80 days', '20-45 days', '14-19 days', '7-13 days', '0-6 days'))

We have everything need to create LifeCycle Grids. We need to combine clients into segments with the following code:

lcg <- orders.segm %>%
 group_by(segm.rec, segm.freq) %>%
 summarise(quantity=n()) %>%
 mutate(client='client') %>%
 ungroup()

The classic matrix can be created with the following code:

lcg.matrix <- dcast(lcg, segm.freq ~ segm.rec, value.var='quantity', fun.aggregate=sum)

However, I suppose a good visualization is obtained through the following code:

ggplot(lcg, aes(x=client, y=quantity, fill=quantity)) +
 theme_bw() +
 theme(panel.grid = element_blank())+
 geom_bar(stat='identity', alpha=0.6) +
 geom_text(aes(y=max(quantity)/2, label=quantity), size=4) +
 facet_grid(segm.freq ~ segm.rec) +
 ggtitle("LifeCycle Grids")

I’ve added colored borders for a better understanding of how to work with this matrix. We have four quadrants:

Does it make sense to make the same offer to all of these customers? Certainly, it doesn’t! It makes sense to create different approaches not only for each quadrant, but for border grids as well.

What I really like about this model of segmentation is that it is stable and alive simultaneously. It is alive in terms of customers flow. Every day, with or without purchases, it will provide customers flow from one grid to another. And it is stable in terms of working with segments. It allows to work with customers who have the same behavior profile. That means you can create suitable campaigns / offers / emails for each or several close grids and use them constantly.

Ok, it’s time to study how we can do some in-depth analysis. R allows us to create subsegments and visualize them effectively. It can be helpful to distribute each grid via some features. For instance, there can be some dependence between behavior and gender. For the other example, where our products have different lifecycles, it can be helpful to analyze which product/s was/were in the last cart or we can combine these features. Let’s do this with the following code:

lcg.sub <- orders.segm %>%
 group_by(gender, cart, segm.rec, segm.freq) %>%
 summarise(quantity=n()) %>%
 mutate(client='client') %>%
 ungroup()

ggplot(lcg.sub, aes(x=client, y=quantity, fill=gender)) +
 theme_bw() +
 theme(panel.grid = element_blank())+
 geom_bar(stat='identity', position='fill' , alpha=0.6) +
 facet_grid(segm.freq ~ segm.rec) +
 ggtitle("LifeCycle Grids by gender (propotion)")


or even:

ggplot(lcg.sub, aes(x=gender, y=quantity, fill=cart)) +
 theme_bw() +
 theme(panel.grid = element_blank())+
 geom_bar(stat='identity', position='fill' , alpha=0.6) +
 facet_grid(segm.freq ~ segm.rec) +
 ggtitle("LifeCycle Grids by gender and last cart (propotion)")

Therefore, there is a lot of space for creativity. If you want to know much more about LifeCycle Grids and strategies for working with quadrants, I highly recommend that you read Jim Novo’s works, e.g. this blogpost.

Thank you for reading this!

To leave a comment for the author, please follow the link and comment on their blog: Analyze Core » R language.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.