Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
By Gabriel Vasconcelos & Yuri Fonseca
Introduction
This post is the second of a series of examples of the BooST (Boosting Smooth Trees) model. You can see an introduction to the model here and the first example here. Our objective in this post is to use the derivatives of the BooST to obtain prices that maximize the profit for a given set of products. We will use a very simple setup that we know the true optimal prices to compare with the estimated prices. The tricky thing here is that the demand functions we defined are for substitute products. Therefore, if we increase the price of product A it will affect positively the demand for product B.
Little Math for those who like it
Suppose we have
where
For the profit function, we can write it in the following form in matrix notation:
where
Now, we can take the derivative of
where
Example
In this example we are going to generate data for the demand function of three substitute products, for example, three different TVs. The data will be generated from the equations in the previous section (we included an error term in the demand equation to introduce some noise). The code below generates the data and presents the optimal analytical prices for each of the three TVs.
library(tidyverse) library(reshape2) library(BooST) # Declare intercept alpha1 = 2000 alpha2 = 2500 alpha3 = 6000 alpha = c(alpha1, alpha2, alpha3) # Declare betas (effect in the own demand, must be negative) b11 = - 3 b22 = - 3 b33 = - 5 # (effect in the demand of other products, must be positive but not too big) b12 = 0.5 ; b13 = 0.1 b21 = 0.5 ; b23 = 0.5 b31 = 0.1; b32 = 0.5 # Matrix beta of the demand function B = matrix(data = c(b11,b12,b13,b21,b22,b23,b31,b32,b33), byrow = T, ncol = 3) # Declare costs c1 = 200 c2 = 500 c3 = 1000 # Vector of costs c = c(c1,c2,c3) # Formula for optimal prices op = as.vector(solve(t(B) + B)%*%(-alpha + t(B)%*%c)) op ## [1] 556.3723 854.3220 1169.5596
Now we are going to generate some data from the demand equations. For each equation we generated 300 observations with prices following a uniform distribution centered in the true optimal prices. Each equation was adjusted to have and R2 of approximately 67\%
set.seed(1) N = 300 p1 = runif(N,op[1]-0.5*op[1],op[1]+0.5*op[1]) p2 = runif(N,op[2]-0.5*op[2],op[2]+0.5*op[2]) p3 = runif(N,op[3]-0.5*op[3],op[3]+0.5*op[3]) # q1 = alpha1+b11*p1+p2*b12+p3*b13 q2 = alpha2+b21*p1+p2*b22+p3*b23 q3 = alpha3+b31*p1+p2*b32+p3*b33 set.seed(1) q1 = q1 + rnorm(N,sd = sd(q1)/2) q2 = q2 + rnorm(N,sd = sd(q2)/2) q3 = q3 + rnorm(N,sd = sd(q3)/2)
The plot below gives us an idea of the three demand functions. Note that we plotted each demand against only their respective price. However, each demand function also depends on the prices of the other TVs.
ind = as.factor(c(rep(1,300),rep(2,300),rep(3,300))) dfplot = data.frame(product = ind, q = c(q1,q2,q3), p = c(p1,p2,p3)) ggplot(data = dfplot) + geom_point(aes(x = p, y = q, color = product))
Next, we use the generated data to run the BooST for each demand. Note that all models must have the three prices to capture the substitution effect.
# Run BooST # df = data.frame(q1,q2,q3,p1,p2,p3) y1 = df$q1 y2 = df$q2 y3 = df$q3 x = as.matrix(df[,-c(1:3)]) set.seed(1) BooST1 = BooST(x,y1, v = 0.1, M = 50) BooST2 = BooST(x,y2, v = 0.1, M = 50) BooST3 = BooST(x,y3, v = 0.1, M = 50)
Before estimating the optimal prices we can explore the resulting demand functions. Recall that the demand of product 1 also depends on the prices of products 2 and 3 and cannot be visualized in two dimensions. To make the example more visual we fixed the prices of products 2 and 3 in their median and estimated the fitted demand curve for product one (black line in the plot below).
dfplot1 = as.data.frame(x) dfplot1$p2 = median(dfplot1$p2) dfplot1$p3 = median(dfplot1$p3) dfplot1$fitted = predict(BooST1,dfplot1) dfplot1$q1 = q1 ggplot(data = dfplot1) + geom_point(aes(x = p1, y = q1),color = "blue") + geom_line(aes(x=p1,y=fitted))
We can follow the same idea from the previous plot to obtain the price elasticity for product 1.
dfplot1$d1 = estimate_derivatives(BooST1,dfplot1[,1:3],1) dfplot1$elast = dfplot1$d1*(dfplot1$p1/dfplot1$fitted) ggplot(data = dfplot1) + geom_point(aes(x = p1,y = elast))
Since our objective is to estimate the optimal prices, the next step is to create a gradient function that estimates the derivatives from each BooST and use them to obtain the gradient of the profit.
gr = function(x,BooST1,BooST2,BooST3){ X = matrix(c(x[1],x[2],x[3]),nrow = 1) q1p1 = estimate_derivatives(BooST1,X,1) q1p2 = estimate_derivatives(BooST2,X,1) q1p3 = estimate_derivatives(BooST3,X,1) q2p1 = estimate_derivatives(BooST1,X,2) q2p2 = estimate_derivatives(BooST2,X,2) q2p3 = estimate_derivatives(BooST3,X,2) q3p1 = estimate_derivatives(BooST1,X,3) q3p2 = estimate_derivatives(BooST2,X,3) q3p3 = estimate_derivatives(BooST3,X,3) q1 = predict(BooST1, X) q2 = predict(BooST2, X) q3 = predict(BooST3, X) g1 = q1+(x[1]-c1)*q1p1+(x[2]-c2)*q1p2+(x[3]-c3)*q1p3 g2 = q2+(x[1]-c1)*q2p1+(x[2]-c2)*q2p2+(x[3]-c3)*q2p3 g3 = q3+(x[1]-c1)*q3p1+(x[2]-c2)*q3p2+(x[3]-c3)*q3p3 g = c(g1,g2,g3) return(g) }
All we have to do now is to move on the gradient until the prices converge. The code below does 10 iterations in a very naive algorithm to obtain the optimal prices. The starting values are far from the true optimal prices but they must be in the support of observed prices.
p_est = matrix(NA,10,3) p_est[1,] = c(300,1200,1700) for(i in 2:10){ p_est[i,] = p_est[i-1,]+0.1*gr(p_est[i-1,],BooST1,BooST2,BooST3) } p_est = as.data.frame(p_est) colnames(p_est) = c("p1","p2","p3") p_est$iter = 1:10
Finally, the results. The dashed lines show the true optimal prices and the continuous lines show the algorithm moving towards the optimal values as we increase the iterations.
dfm = melt(p_est,id.vars = "iter") ggplot(data = dfm) + geom_line(aes(x = iter, y = value, color = variable)) + theme_light() + geom_hline(yintercept = op[1],col = "red",linetype = 2) + geom_hline(yintercept = op[2],col = "green",linetype = 2) + geom_hline(yintercept = op[3],col = "blue",linetype = 2)
Some observations
-
You could also use the gradient function we defined in the R function optim and find similar results. It is always good to use the analytical gradient in the optim, especially if we are optimizing on many variables.
-
We estimated three BooSTs, one for each product. However, it is possible to obtain the same results with only one BooST if we estimate the model straight on the profits without looking at the demand. The downside is that we will recover the individual demand functions separated from the profit.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.