Marketing Multi-Channel Attribution model with R (part 2: practical issues)

[This article was first published on R language – AnalyzeCore – data is beautiful, data is a story, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is the second post about the Marketing Multi-channel Attribution Model with Markov chains (here is the first one). Even though the concept of the first-order Markov chains is pretty simple, you can face other issues and challenges when implementing the approach in practice. In this article, we will review some of them. I tried to organize this article in a way that you can use it as a framework or can help you to create your own.

The main steps that we will review are the following:

  • splitting paths depending on purchases counts
  • replacing some channels/touch points
  • a unique channel/touchpoint case
  • consequent duplicated channels in the path and higher order Markov chains
  • paths that haven’t led to a conversion
  • customer journey duration
  • attributing revenue and costs comparisons

As usually, we start by simulating the data sample for experiments that includes customer ids, date stamp of contact with a marketing channel, marketing channel and conversion mark (0/1).

click to expand R code
library(tidyverse)
library(reshape2)
library(ChannelAttribution)
library(markovchain)
library(visNetwork)
library(expm)
library(stringr)


##### simulating the "real" data #####
set.seed(454)
df_raw <- data.frame(customer_id = paste0('id', sample(c(1:20000), replace = TRUE)),
 date = as.Date(rbeta(80000, 0.7, 10) * 100, origin = "2016-01-01"),
 channel = paste0('channel_', sample(c(0:7), 80000, replace = TRUE, prob = c(0.2, 0.12, 0.03, 0.07, 0.15, 0.25, 0.1, 0.08)))
 ) %>%
 group_by(customer_id) %>%
 mutate(conversion = sample(c(0, 1), n(), prob = c(0.975, 0.025), replace = TRUE)) %>%
 ungroup() %>%
 dmap_at(c(1, 3), as.character) %>%
 arrange(customer_id, date)

df_raw <- df_raw %>%
 mutate(channel = ifelse(channel == 'channel_2', NA, channel))

In addition, I’ve replaced channel_2 with NA values. The initial data sample looks like:

1. Splitting paths depending on purchases counts

It makes sense to attribute paths of the first purchase and of the n-th purchase separately.
We assume that each subsequent purchase moves the customer deeper in her lifecycle with a company. We can expect a huge difference between the first customer journey and, for instance, the tenth. Therefore, marketing channels work differently for customers on different phases of their lifecycles. I recommend splitting paths by purchase and compute the model for first-time buyers and n-times buyers separately. For rational splitting, the concept of Life-Cycle Grids can be very helpful.
For instance, if the customer’s path of her lifetime with the company looks like this:
C1 -> C4 -> C2 -> C3 -> conversion (first purchase) -> C2 -> C3 -> conversion (second purchase) -> C3 -> conversion (third purchase) -> C5.
We can split it like this:
a) C1 -> C4 -> C2 -> C3 -> conversion (first purchase),
b) C2 -> C3 -> conversion (second purchase),
c) C3 -> conversion (third purchase),
d) C5.
After this, compute the attribution models for the path a), and b) and c) separately. Path d) can be used for the generic probabilistic model (see point #5).
We can add the serial number of the path by using, for example, the lagged cumulative sum of conversion binary marks with the following simple code:
click to expand R code
##### splitting paths #####
df_paths <- df_raw %>%
 group_by(customer_id) %>%
 mutate(path_no = ifelse(is.na(lag(cumsum(conversion))), 0, lag(cumsum(conversion))) + 1) %>%
 ungroup()
You can see now that, for example, customer id18055 has 4 paths:
1) channel_1 -> conversion #1
2) channel_4 -> channel_0 -> channel _6 -> conversion #2
3) channel_4 -> conversion #3
4) channel_6 -> NA -> channel_5
Using the Life-Cycle Grids concept (or a different one) we can split the data set into different sets and compute attribution separately for different customer segments. For simplicity, we will compute attribution for first-purchasers only. Therefore, the code is the following:
click to expand R code
df_paths_1 <- df_paths %>%
 filter(path_no == 1) %>%
 select(-path_no)

2. Replacing/removing touchpoints

It makes sense to replace or remove some channels when:
1) the marketing channel is unknown (NA value) due to variety of reasons
2) there is a specific channel in the path that we don’t want to attribute such as Direct channel.
There are two main methods for these cases: either to remove NA/Direct channels or to replace them with the previous channel in the path or we can combine both methods: remove NAs and replace Direct channel.
Note: by using replacing with the first-order Markov chains, we will obtain the same outcomes as those comparing with the removing method because duplicated touchpoints don’t affect the outcomes (see point #4).
The following are examples of typical transformations from the combined approach:
1) initial path “C1 -> C2 -> Direct -> C3 -> conversion” we transform to “C1 -> C2 -> С2 -> C3 -> conversion”
2) initial path “Direct -> C3 -> C1 -> C2 -> conversion” we transform to “C3 -> C1 -> C2 -> conversion”
3) initial path “C1 -> C2 -> NA -> C3 -> conversion” we transform to “C1 -> C2 -> C3 -> conversion”
4) initial path “C3 -> NA -> Direct -> C2 -> conversion” we transform to “C3 -> C3 -> C2 -> conversion”.
Let’s assume that we want to replace channel_6 with the previous non-channel_6 and skip unknown (NA) touch points. Note the following assumptions:
1) we will remove paths NA -> conversion. My point is that it doesn’t make sense to attribute an unknown channel even if it brings a conversion. We would not use this information.
2) we will remove Direct (channel_6) if it is the first in the path because we don’t have a previous channel to be replaced with. Again, in the case Direct -> conversion path we will remove conversion and my idea is the same as with the NA case.
On the other hand, you can adapt the code to your own point of view. We will use the following code:
click to expand R code
##### replace some channels #####
df_path_1_clean <- df_paths_1 %>%
 # removing NAs
 filter(!is.na(channel)) %>%
 
 # adding order of channels in the path
 group_by(customer_id) %>%
 mutate(ord = c(1:n()),
 is_non_direct = ifelse(channel == 'channel_6', 0, 1),
 is_non_direct_cum = cumsum(is_non_direct)) %>%
 
 # removing Direct (channel_6) when it is the first in the path
 filter(is_non_direct_cum != 0) %>%
 
 # replacing Direct (channel_6) with the previous touch point
 mutate(channel = ifelse(channel == 'channel_6', channel[which(channel != 'channel_6')][is_non_direct_cum], channel)) %>%
 
 ungroup() %>%
 select(-ord, -is_non_direct, -is_non_direct_cum)

3. One- and multi-channel paths issue

It makes sense to split a unique channel and multi-channel paths.
As I mentioned in the previous article, when using the Removal Effect, you should calculate the weighted importance for each channel/touchpoint because the sum of the Removal Effects doesn’t equal to 1.
In case we have a path with a unique channel, the Removal Effect and importance of this channel for that exact path is 1. However, weighting with other multi-channel paths will decrease the importance of one-channel occurrences. That means that, in case we have a channel that occurs in one-channel paths, usually it will be underestimated if attributed with multi-channel paths.
There is also a pretty straight logic behind splitting – for one-channel paths, we definitely know the channel that brought a conversion and we don’t need to distribute that value into other channels.
We can do this with a simple algorithm:
  1. Split data for paths with one or more unique channels
  2. Calculate total conversions for one-channel paths and compute the Markov model for multi-channel paths
  3. Summarize results for each channel.
Let’s check the difference between when we don’t split the data and when we do split it with the following code:
click to expand R code
##### one- and multi-channel paths #####
df_path_1_clean <- df_path_1_clean %>%
 group_by(customer_id) %>%
 mutate(uniq_channel_tag = ifelse(length(unique(channel)) == 1, TRUE, FALSE)) %>%
 ungroup()

df_path_1_clean_uniq <- df_path_1_clean %>%
 filter(uniq_channel_tag == TRUE) %>%
 select(-uniq_channel_tag)

df_path_1_clean_multi <- df_path_1_clean %>%
 filter(uniq_channel_tag == FALSE) %>%
 select(-uniq_channel_tag)

### experiment ###
# attribution model for all paths
df_all_paths <- df_path_1_clean %>%
 group_by(customer_id) %>%
 summarise(path = paste(channel, collapse = ' > '),
 conversion = sum(conversion)) %>%
 ungroup() %>%
 filter(conversion == 1)

mod_attrib <- markov_model(df_all_paths,
 var_path = 'path',
 var_conv = 'conversion',
 out_more = TRUE)
mod_attrib$removal_effects
mod_attrib$result
d_all <- data.frame(mod_attrib$result)

# attribution model for splitted multi and unique channel paths
df_multi_paths <- df_path_1_clean_multi %>%
 group_by(customer_id) %>%
 summarise(path = paste(channel, collapse = ' > '),
 conversion = sum(conversion)) %>%
 ungroup() %>%
 filter(conversion == 1)

mod_attrib_alt <- markov_model(df_multi_paths,
 var_path = 'path',
 var_conv = 'conversion',
 out_more = TRUE)
mod_attrib_alt$removal_effects
mod_attrib_alt$result

# adding unique paths
df_uniq_paths <- df_path_1_clean_uniq %>%
 filter(conversion == 1) %>%
 group_by(channel) %>%
 summarise(conversions = sum(conversion)) %>%
 ungroup()

d_multi <- data.frame(mod_attrib_alt$result)

d_split <- full_join(d_multi, df_uniq_paths, by = c('channel_name' = 'channel')) %>%
 mutate(result = total_conversions + conversions)

sum(d_all$total_conversions)
sum(d_split$result)
As you can see, the total sum is equal but attribution is different because for the reason that I mentioned earlier.

4. Higher order of Markov chains and consequent duplicated channels in the path

It doesn’t matter to skip or not duplicates for the first-order Markov chains.
The ChannelAttribution package allows us to change the order of Markov chains via the “order” parameter. This means we can compute transition probabilities based on the previous two, three or more channels.
When using one-order Markov chains, a subsequence of the same channels in a path (duplicates) can be reduced to one channel (for example C2 in the path C1 → C2 → C2 → C2 → C3 can be reduced to C1 → C2 → C3). Because, mathematically, it doesn’t matter how many times each C2 goes through the loop with itself in the transition matrix, it will be in the C3 state finally. Therefore, we will obtain different transition matrices but the same Removal Effect for channels with or without subsequent duplicates.
However, increasing the order of the Markov graph will affect the results if we skip C2. This is due to computing probabilities based on, for example, two subsequent channels as one segment of the path in the case of the second-order Markov graph. Therefore, removing consequent duplicated channels can have a huge effect for higher order Markov chains.
In order to check the effect of skipping duplicates in the first-order Markov chain, we will use my script for “manual” calculation because the package skips duplicates automatically. The following is the code for computing the attribution “manually”:
click to expand R code
##### Higher order of Markov chains and consequent duplicated channels in the path #####

# computing transition matrix - 'manual' way
df_multi_paths_m <- df_multi_paths %>%
 mutate(path = paste0('(start) > ', path, ' > (conversion)'))
m <- max(str_count(df_multi_paths_m$path, '>')) + 1 # maximum path length

df_multi_paths_cols <- colsplit(string = df_multi_paths_m$path, pattern = ' > ', names = c(1:m))
colnames(df_multi_paths_cols) <- paste0('ord_', c(1:m))
df_multi_paths_cols[df_multi_paths_cols == ''] <- NA

df_res <- vector('list', ncol(df_multi_paths_cols) - 1)

for (i in c(1:(ncol(df_multi_paths_cols) - 1))) {
 
 df_cache <- df_multi_paths_cols %>%
 select(num_range("ord_", c(i, i+1))) %>%
 na.omit() %>%
 group_by_(.dots = c(paste0("ord_", c(i, i+1)))) %>%
 summarise(n = n()) %>%
 ungroup()
 
 colnames(df_cache)[c(1, 2)] <- c('channel_from', 'channel_to')
 df_res[[i]] <- df_cache
}

df_res <- do.call('rbind', df_res)

df_res_tot <- df_res %>%
 group_by(channel_from, channel_to) %>%
 summarise(n = sum(n)) %>%
 ungroup() %>%
 group_by(channel_from) %>%
 mutate(tot_n = sum(n),
 perc = n / tot_n) %>%
 ungroup()

df_dummy <- data.frame(channel_from = c('(start)', '(conversion)', '(null)'),
 channel_to = c('(start)', '(conversion)', '(null)'),
 n = c(0, 0, 0),
 tot_n = c(0, 0, 0),
 perc = c(0, 1, 1))

df_res_tot <- rbind(df_res_tot, df_dummy)

# comparing transition matrices
trans_matrix_prob_m <- dcast(df_res_tot, channel_from ~ channel_to, value.var = 'perc', fun.aggregate = sum)
trans_matrix_prob <- data.frame(mod_attrib_alt$transition_matrix)
trans_matrix_prob <- dcast(trans_matrix_prob, channel_from ~ channel_to, value.var = 'transition_probability')

# computing attribution - 'manual' way
channels_list <- df_path_1_clean_multi %>%
 filter(conversion == 1) %>%
 distinct(channel)
channels_list <- c(channels_list$channel)

df_res_ini <- df_res_tot %>% select(channel_from, channel_to)
df_attrib <- vector('list', length(channels_list))

for (i in c(1:length(channels_list))) {
 
 channel <- channels_list[i]
 
 df_res1 <- df_res %>%
 mutate(channel_from = ifelse(channel_from == channel, NA, channel_from),
 channel_to = ifelse(channel_to == channel, '(null)', channel_to)) %>%
 na.omit()
 
 df_res_tot1 <- df_res1 %>%
 group_by(channel_from, channel_to) %>%
 summarise(n = sum(n)) %>%
 ungroup() %>%
 
 group_by(channel_from) %>%
 mutate(tot_n = sum(n),
 perc = n / tot_n) %>%
 ungroup()
 
 df_res_tot1 <- rbind(df_res_tot1, df_dummy) # adding (start), (conversion) and (null) states
 
 df_res_tot1 <- left_join(df_res_ini, df_res_tot1, by = c('channel_from', 'channel_to'))
 df_res_tot1[is.na(df_res_tot1)] <- 0
 
 df_trans1 <- dcast(df_res_tot1, channel_from ~ channel_to, value.var = 'perc', fun.aggregate = sum)
 
 trans_matrix_1 <- df_trans1
 rownames(trans_matrix_1) <- trans_matrix_1$channel_from
 trans_matrix_1 <- as.matrix(trans_matrix_1[, -1])
 
 inist_n1 <- dcast(df_res_tot1, channel_from ~ channel_to, value.var = 'n', fun.aggregate = sum)
 rownames(inist_n1) <- inist_n1$channel_from
 inist_n1 <- as.matrix(inist_n1[, -1])
 inist_n1[is.na(inist_n1)] <- 0
 inist_n1 <- inist_n1['(start)', ]
 
 res_num1 <- inist_n1 %*% (trans_matrix_1 %^% 100000)
 
 df_cache <- data.frame(channel_name = channel,
 conversions = as.numeric(res_num1[1, 1]))
 
 df_attrib[[i]] <- df_cache
}

df_attrib <- do.call('rbind', df_attrib)

# computing removal effect and results
tot_conv <- sum(df_multi_paths_m$conversion)

df_attrib <- df_attrib %>%
 mutate(tot_conversions = sum(df_multi_paths_m$conversion),
 impact = (tot_conversions - conversions) / tot_conversions,
 tot_impact = sum(impact),
 weighted_impact = impact / tot_impact,
 attrib_model_conversions = round(tot_conversions * weighted_impact)
 ) %>%
 select(channel_name, attrib_model_conversions)
As you can see, even when we’ve obtained different transition matrices (trans_matrix_prob_m vs. trans_matrix_prob), the removal effects and attribution results are the same (df_attrib vs. mod_attrib_alt$result) for the package (that skipped duplicated subsequent channels) as with “manual” calculations (with duplicates).

5. Deal with paths that haven’t led to conversion

If you can track both user journeys that finished or not in conversions, doing this will help you obtain extra value from advanced methods.
We need paths with conversions only for attributing marketing channels. However, if you collect both customer journeys that finished or not in conversions, that will give you some advanced opportunities: a generic probabilistic model that has both positive (conversion) and negative (null) outcomes. The model can allow you to look at the complete “picture” of the business, not just valuable transitions.
Additionally, the complete probabilistic model can be used as a predictive model:
  1. At the least, you can model:
  • customers’ flow through marketing channels,
  • a projection of how many customers will touch the exact marketing channel after N touches or how many conversions you can obtain if you attract traffic (N visits) from this or that channel,
  • what channels have a higher probability of user churn and thus know whether or not we should change the acquisition through them.
  1. an advanced approach that allows to use transition probabilities and other features like visits recency, frequency, durations, demographics, etc. in order to predict conversions with machine learning algorithms.
These are rather complex questions that cover such topics as channels capacity (when a linear increase in the costs of acquisition doesn’t lead to a linear increase in conversions), marketing budgets allocation, a possible reduction in customers life-time value from attracting less loyal customers or one-time buyers, etc. These aspects are not a topic of this article. I just want you to pay attention to the fact that the real world is more complex than in the Markov chains model. Nonetheless, using the approach with the right amount of attention and understanding of the business domain can bring about good results.
Ok, let’s compute complete probabilistic model for the first paths we have with the following code:
click to expand R code
##### Generic Probabilistic Model #####
df_all_paths_compl <- df_path_1_clean %>%
 group_by(customer_id) %>%
 summarise(path = paste(channel, collapse = ' > '),
 conversion = sum(conversion)) %>%
 ungroup() %>%
 mutate(null_conversion = ifelse(conversion == 1, 0, 1))

mod_attrib_complete <- markov_model(
 df_all_paths_compl,
 var_path = 'path',
 var_conv = 'conversion',
 var_null = 'null_conversion',
 out_more = TRUE
)

trans_matrix_prob <- mod_attrib_complete$transition_matrix %>%
 dmap_at(c(1, 2), as.character)

##### viz #####
edges <-
 data.frame(
 from = trans_matrix_prob$channel_from,
 to = trans_matrix_prob$channel_to,
 label = round(trans_matrix_prob$transition_probability, 2),
 font.size = trans_matrix_prob$transition_probability * 100,
 width = trans_matrix_prob$transition_probability * 15,
 shadow = TRUE,
 arrows = "to",
 color = list(color = "#95cbee", highlight = "red")
 )

nodes <-
 data_frame(id = c(
 c(trans_matrix_prob$channel_from),
 c(trans_matrix_prob$channel_to)
 )) %>%
 distinct(id) %>%
 arrange(id) %>%
 mutate(
 label = id,
 color = ifelse(
 label %in% c('(start)', '(conversion)'),
 '#4ab04a',
 ifelse(label == '(null)', '#ce472e', '#ffd73e')
 ),
 shadow = TRUE,
 shape = "box"
 )

visNetwork(nodes,
 edges,
 height = "2000px",
 width = "100%",
 main = "Generic Probabilistic model's Transition Matrix") %>%
 visIgraphLayout(randomSeed = 123) %>%
 visNodes(size = 5) %>%
 visOptions(highlightNearest = TRUE)
This time I used the really cool R package visNetwork for Markov graph visualization. It looks a bit messy with nine nodes but it is interactive and you can move nodes, highlight them as well as edges and do other transformation thanks to the flexibility this package provides.
The following is an example of how you can model customers transition through marketing channels and project the number of conversions. Let’s assume that we are going to attract 1000 visits from channel_5 and we want to model how many conversions we will obtain or see what channels customers will make contact with in several steps. The script involves manipulating the matrix of transition probabilities associated with the Markov chain.
Once we have the transition matrix computed, we can project in what state (channel) that customer contact will be, for example, in 5 steps (5th degree) or how many conversion we can obtain (we use 100,000 degrees to makes sure that all customers transited through the transition matrix).
click to expand R code
##### modeling states and conversions #####
# transition matrix preprocessing
trans_matrix_complete <- mod_attrib_complete$transition_matrix
trans_matrix_complete <- rbind(trans_matrix_complete,
 df_dummy %>%
 mutate(transition_probability = perc) %>%
 select(channel_from, channel_to, transition_probability))
trans_matrix_complete$channel_to <- factor(trans_matrix_complete$channel_to, levels = c(levels(trans_matrix_complete$channel_from)))
trans_matrix_complete <- dcast(trans_matrix_complete, channel_from ~ channel_to, value.var = 'transition_probability')
trans_matrix_complete[is.na(trans_matrix_complete)] <- 0
rownames(trans_matrix_complete) <- trans_matrix_complete$channel_from
trans_matrix_complete <- as.matrix(trans_matrix_complete[, -1])

# creating empty matrix for modeling
model_mtrx <- matrix(data = 0,
 nrow = nrow(trans_matrix_complete), ncol = 1,
 dimnames = list(c(rownames(trans_matrix_complete)), '(start)'))
# adding modeling number of visits
model_mtrx['channel_5', ] <- 1000

c(model_mtrx) %*% (trans_matrix_complete %^% 5) # after 5 steps
c(model_mtrx) %*% (trans_matrix_complete %^% 100000) # after 100000 steps

6. Customer journey duration

Usually, we compute the model regularly for the company’s standard reporting period.
When speaking about the Attribution model, it is quite simple to deal with durations. Once we have a conversion, we extract retrospective contacts with marketing channels and we can limit a retrospective period of for example 30 or 90 days based on our knowledge about the typical duration of customer lifetime cycle and omit contacts that happened earlier (or even compute the model without limitations). For defining retrospective periods, we can look at the distribution of durations from the first touch to conversion and choose e.g. 95% of occurrences.
In R we can obtain these analytics with the following code:
click to expand R code
##### Customer journey duration #####
# computing time lapses from the first contact to conversion/last contact
df_multi_paths_tl <- df_path_1_clean_multi %>%
 group_by(customer_id) %>%
 summarise(path = paste(channel, collapse = ' > '),
 first_touch_date = min(date),
 last_touch_date = max(date),
 tot_time_lapse = round(as.numeric(last_touch_date - first_touch_date)),
 conversion = sum(conversion)) %>%
 ungroup()

# distribution plot
ggplot(df_multi_paths_tl %>% filter(conversion == 1), aes(x = tot_time_lapse)) +
 theme_minimal() +
 geom_histogram(fill = '#4e79a7', binwidth = 1)

# cumulative distribution plot
ggplot(df_multi_paths_tl %>% filter(conversion == 1), aes(x = tot_time_lapse)) +
 theme_minimal() +
 stat_ecdf(geom = 'step', color = '#4e79a7', size = 2, alpha = 0.7) +
 geom_hline(yintercept = 0.95, color = '#e15759', size = 1.5) +
 geom_vline(xintercept = 23, color = '#e15759', size = 1.5, linetype = 2)
You can see that 23 days period covers 95% of paths.
It is much more complex to deal with duration for the generic probabilistic model. We can use the same approach for paths that finished in a conversion as of reporting date but, in addition, we need to manage paths that did not. We can’t be sure which of them we should accept as fruitless but which of them has a higher chance to bring us a conversion in the next reporting date/period.
Let’s visualize this issue for better understanding with the following code assuming reporting date as of January 10, 2016:
click to expand R code
### for generic probabilistic model ###
df_multi_paths_tl_1 <- melt(df_multi_paths_tl[c(1:50), ] %>% select(customer_id, first_touch_date, last_touch_date, conversion),
 id.vars = c('customer_id', 'conversion'),
 value.name = 'touch_date') %>%
 arrange(customer_id)
rep_date <- as.Date('2016-01-10', format = '%Y-%m-%d')

ggplot(df_multi_paths_tl_1, aes(x = as.factor(customer_id), y = touch_date, color = factor(conversion), group = customer_id)) +
 theme_minimal() +
 coord_flip() +
 geom_point(size = 2) +
 geom_line(size = 0.5, color = 'darkgrey') +
 geom_hline(yintercept = as.numeric(rep_date), color = '#e15759', size = 2) +
 geom_rect(xmin = -Inf, xmax = Inf, ymin = as.numeric(rep_date), ymax = Inf, alpha = 0.01, color = 'white', fill = 'white') +
 theme(legend.position = 'bottom',
 panel.border = element_blank(),
 panel.grid.major = element_blank(),
 panel.grid.minor = element_blank(),
 axis.ticks.x = element_blank(),
 axis.ticks.y = element_blank()) +
 guides(colour = guide_legend(override.aes = list(size = 5)))
You can see that id10072 finished the path in a conversion so we can add its retrospective touchpoints into the model. On the other hand, id10010‘s, id1001‘s and id10005‘s paths are fruitless as of reporting date but customer id10010 will purchase on January 19, 2016, customer id1001 will contact with a marketing channel on January 15, 2016, but won’t purchase and customer id10005 won’t have any new contacts with marketing channels, it is fruitless.
Therefore, our task is to compute generic model trying to identify which paths are completed as of reporting date both in a conversion or not. For example, we should use paths of id10072 and id10005 customers for computing the model, because we don’t expect new contacts or first purchases from them anymore.
We need to develop some criteria for identifying if an exact path has a higher chance to finish in a conversion or not. Again, we can analyze stats that characterize successful paths and make the assumption that if a path violates common stats towards success then it is fruitless with high probability.
There are at least two values I can suggest to start with:
1) time lapse from the first contact,
2) time lapse between the conversion date and a previous contact.
These values can be used as a combination of rules. From the above analysis we know that 95% of purchases were done in 23 days and we can compute time lapses between the conversion date and the previous contact with the following code:
click to expand R code
df_multi_paths_tl_2 <- df_path_1_clean_multi %>%
 group_by(customer_id) %>%
 mutate(prev_touch_date = lag(date)) %>%
 ungroup() %>%
 filter(conversion == 1) %>%
 mutate(prev_time_lapse = round(as.numeric(date - prev_touch_date)))
 
# distribution
ggplot(df_multi_paths_tl_2, aes(x = prev_time_lapse)) +
 theme_minimal() +
 geom_histogram(fill = '#4e79a7', binwidth = 1)

# cumulative distribution
ggplot(df_multi_paths_tl_2, aes(x = prev_time_lapse)) +
 theme_minimal() +
 stat_ecdf(geom = 'step', color = '#4e79a7', size = 2, alpha = 0.7) +
 geom_hline(yintercept = 0.95, color = '#e15759', size = 1.5) +
 geom_vline(xintercept = 12, color = '#e15759', size = 1.5, linetype = 2)
We can see that 95% of customers have made a purchase during 12 days from the previous contact. Therefore, we can assume that if a customer made contact with a marketing channel the first time for more than 23 days and/or hasn’t made contact with a marketing channel for the last 12 days, then it is a fruitless path. Hint: for more accurate assumptions, these rules can be computed depending on the exact marketing channels.
We can extract a data for the generic probabilistic model when both rules are true with the following code:
click to expand R code
# extracting data for generic model
df_multi_paths_tl_3 <- df_path_1_clean_multi %>%
 group_by(customer_id) %>%
 mutate(prev_time_lapse = round(as.numeric(date - lag(date)))) %>%
 summarise(path = paste(channel, collapse = ' > '),
 tot_time_lapse = round(as.numeric(max(date) - min(date))),
 prev_touch_tl = prev_time_lapse[which(max(date) == date)],
 conversion = sum(conversion)) %>%
 ungroup() %>%
 mutate(is_fruitless = ifelse(conversion == 0 & tot_time_lapse > 20 & prev_touch_tl > 10, TRUE, FALSE)) %>%
 filter(conversion == 1 | is_fruitless == TRUE)

7. Attributing revenue and costs comparison

Once we have the marketing channels attributed, we can compute their effectiveness. The first way is by comparing cost per action or CPA (conversion in our case) for different channels. For this, we need to divide the total cost spent on the exact channel by the modeled number of conversions and compare. For instance:
Additionally, the ChannelAttribution package allows to distribute revenue through channels. For this, we need to add the parameter “var_value” with a column of revenues into the markov_model() function. Therefore, it is possible to compare channels’ gross margin.

Conclusions

As you can see, using the Markov chains approach provides an effective way for marketing channels attribution. There are quite a lot of other uses of the approach:
  • the Markov model can be effectively used for LTV prediction
  • we can include additional types of interactions into the model (for example, banner impressions, tv ads, calls to callcenter and so on) and evaluate their influence
  • develop “likely to buy” predictive model using ML algorithms
and so on.
I’m going to share another method for attributing marketing channels that is a mix of probabilistic and heuristic models based on sales funnel. It is very interesting, don’t miss it!

The post Marketing Multi-Channel Attribution model with R (part 2: practical issues) appeared first on AnalyzeCore - data is beautiful, data is a story.

To leave a comment for the author, please follow the link and comment on their blog: R language – AnalyzeCore – data is beautiful, data is a story.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)