LogRatio Regression – A Simple Way to Model Compositional Data
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The compositional data are proportionals of mutually exclusive groups that would be summed up to the unity. Statistical models for compositional data have been applicable in a number of areas, e.g. the product or channel mix in the marketing research and asset allocations of a investment portfolio.
In the example below, I will show how to model compositional outcomes with a simple LogRatio regression. The underlying idea is very simple. With the D-dimension outcome [p_1, p_2…p_D], we can derive a [D-1]-dimension outcome [log(p_2 / p_1)…log(p_D / p_1)] and then estimate a multivariate regression based on the new outcome.
df = get("ArcticLake", envir = asNamespace('DirichletReg')) # sand silt clay depth #1 0.775 0.195 0.030 10.4 #2 0.719 0.249 0.032 11.7 #3 0.507 0.361 0.132 12.8 lm(cbind(log(silt / sand), log(clay / sand)) ~ depth, data = df) #Response log(silt/sand): #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) -0.649656 0.236733 -2.744 0.0093 ** #depth 0.037522 0.004269 8.790 1.36e-10 *** # #Response log(clay/sand) : #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) -2.614897 0.421383 -6.206 3.31e-07 *** #depth 0.062181 0.007598 8.184 8.00e-10 ***
Since log(x / y) = log(x) – log(y), we can also estimate the model with log(sand) as an offset term.
lm(cbind(log(silt), log(clay)) ~ depth + offset(log(sand)), data = df) #Response log(silt) : #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) -0.649656 0.236733 -2.744 0.0093 ** #depth 0.037522 0.004269 8.790 1.36e-10 *** # #Response log(clay) : #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) -2.614897 0.421383 -6.206 3.31e-07 *** #depth 0.062181 0.007598 8.184 8.00e-10 ***
Alternatively, we can also use the comp.reg function in the Compositional package.
Compositional::comp.reg(as.matrix(df[, 1:3]), df[, 4]) #$be # [,1] [,2] #(Intercept) -0.64965598 -2.61489731 #x 0.03752186 0.06218069 # #$seb # [,1] [,2] #(Intercept) 0.236733203 0.421382652 #x 0.004268588 0.007598043
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.