Using recurrent neural networks to segment customers
[This article was first published on R – Gradient Metrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Understanding consumer segments is key to any successful business. Analytically, segmentations involve clustering a dataset to find groups of similar customers. What “similar” means is defined by the data that goes into the clustering — it could be demographic, attitudinal, or other characteristics. And the data that goes into the clustering is often limited by the clustering algorithms themselves — most require some kind of tabular data structure, and common techniques like k-Means require strictly numeric input. Breaking out of these restrictions has been one of our top priorities since starting the company.
So what do you do when you want to find segments of customers that are “similar” because they behave similarly — their experience with you, their brand, has been similar. How would you define that? Increasingly, companies are collecting sequence data, with each entry being an interaction with a customer — be it a purchase, reading an email, visiting the website, etc. Given the popularity of deep learning techniques to tackle sequence-related learning tasks, we thought applying neural networks to customer segmentation was the natural approach.
This post builds off of our previous customer journey segmentation post and demonstrates a prototype of a deep learning approach to behavior sequence segmentation. We wanted to investigate if we could leverage the internal state of a recurrent neural network (RNN) on complex sequences of data to identify distinctive customer segments.
Turns out that we can. And it works well.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Data description
Our client recorded a behavioral dataset for each customer interaction such as receiving an email, opening an email or using the app, so a single users “sequence” looks like this. Note that each sequence can have a variable number of rows.User ID | Cancel | Sent Email | Open email | Click email | App used | Site visited | Days since last interaction |
1001 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
1001 | 0 | 0 | 0 | 0 | 0 | 1 | 2 |
1001 | 0 | 0 | 0 | 0 | 0 | 1 | 4 |
1001 | 0 | 1 | 0 | 0 | 0 | 0 | 5 |
1001 | 0 | 0 | 1 | 0 | 0 | 0 | 7 |
1001 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
Developing the Neural Network
We developed a very simple neural network architecture which is described below. For this sample of customers, we knew whether or not they had churned by the time the data was collected, so our “X’s” were the sequences of customer behavior, and our “Y’s” were 0/1s depending on if the customer had churned. Therefore we had a sigmoid output layer which predicted either a 0 or 1 and a recurrent input layer, which is able to handle variable length sequences. We included a dense layer to make the network more powerful, and to generate encodings.Layer | Input dimension | Output dimension |
Recurrent | Variable | 10 |
Dense | 10 | 10 (used for encoding) |
Sigmoid | 10 | 1 |
User ID | Encoding_1 | Encoding_2 | Encoding_3 | Encoding_4 | Encoding_5 | … |
1001 | 0 | 0 | 0.4 | 12.8 | 0.5 | |
1002 | 0.1 | 1.3 | 0.9 | 14.7 | 141.0 | |
1003 | 0.1 | 1.3 | 0.9 | 14.7 | 141.0 | |
1004 | 0.1 | 1.3 | 0.9 | 14.7 | 141.0 | |
1005 | 0.0 | 0.0 | 0.0 | 0.5 | 0 |
Clustering the RNN encodings
The encodings capture all of the information of the neural network. Although they do not have any inherent meaning we can use them in a clustering algorithm to identify distinct segments. Which is exactly what we did. We decided to run a DBSCAN on the encoded sequence data. DBSCAN had the advantage (in this case) of being able to handle non-linearities in the data and for not needing to specify the number of clusters in advance. K-means performed similarly.Results
The DBSCAN algorithm identifies five distinct clusters with some significant, and valuable differences between them.Segment | Percentage of customers | Avg. E-mails Clicked | Avg. E-mails Opened | Avg. App Actions | Avg. Site Visits | Avg. Churn Date | Churn percentage |
1 | 0.3% | 2.11 | 22.8 | 16.4 | 18.2 | 325 | 30.1% |
2 | 34.5% | 1.13 | 11.5 | 3.6 | 8.1 | 308 | 16.7% |
3 | 59.5% | 0.3 | 3.2 | 0.1 | 2.9 | 88 | 98% |
4 | 5.5% | 4.0 | 27.0 | 89.5 | 16.5 | 337 | 0.1% |
5 | 0.2% | 0.5 | 2.0 | 0.0 | 1.5 | 93 | 93% |
Takeaways
- Sequence data is increasingly being captured by brands and methods for exploring it must be developed
- Recurrent neural networks are an effective way of generating encodings for behavioral sequence data
- Clustering the encodings (results of intermediate layers) of a neural network can be an effective way of peering inside the black box
To leave a comment for the author, please follow the link and comment on their blog: R – Gradient Metrics.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.