Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
There are multiple add-on packages available in R to fit choice models in a Bayesian framework. These include bayesm, choiceModelR, and flipChoice. In this article, I fill focus on the use of flipChoice, which relies on the state-of-the-art probabilistic programming language stan to fit the choice model.
Getting Started
I assume you are already familiar with choice models so I’ll be focusing solely on their estimation using flipChoice. You can install flipChoice from GitHub using the devtools package but you may need to first install devtools using the command install.packages(“devtools”) in R. You can then install flipChoice using devtools::install_github(“Displayr/flipChoice”). The main function that handles fitting of choice models in the package is FitChoiceModel.
FitChoiceModel offers great flexibility in the types of inputs it accepts. Data can be in any of five different formats, including Sawtooth CHO file format, Q experiment question format, a ChoiceModelDesign output from flipChoice::ChoiceModelDesign, Sawtooth Dual file format, and JMP file format.
Inputs/Function Arguments
I’ll key you in on some of the arguments/parameters of the FitChoiceModel function you should be aware of to fit your own choice model.
- design – use this argument if you have a ChoiceModelDesign output from flipChoice
- choices – for use with design, a data.frame with the choices made by each respondent (one column per question)
- questions – for use with design, a data.frame with the IDs of the tasks presented to each respondent (with one column per question and one row for each respondent)
- data – use if you have a data.frame in Q experiment question format (example below)
- file – A text string giving the path to a CHO file
- respondent.ids – for use with cho.file, a vector of respondent IDs
- file – A text string giving the path (possibly a URL) to a Sawtooth or JMP design file
- levels.file – A text string giving the path to a JMP or CHO
- iterations, hb.chains, hb.max.tree.depth, hb.adapt.delta – parameters controlling the stan Monte Carlo algorithm. The function tries to provide sensible defaults for these, but they should be adjusted if there are any warnings about sampling problems or the diagnostic functions (mentioned below) indicate any convergence issues.
- classes – For multi-class HB, the number of latent classes. The default is to only have one class.
- tasks.left.out – If you wish to perform cross validation to check the predictive accuracy of your model, this specifies the number of classes to leave out.
- cov.formula, cov.data – can be used to specific respondent specific covariates (see here)
- hb.prior.mean, hb.prior.sd – Optional vectors for specifying prior values for the mean parameters
Additional parameters are discussed in the documentation; type ?flipChoice::FitChoiceModel at the R prompt to view it. The function also accepts arguments which will be passed on to rstan::sampling and rstan::stan.
Example data
I’ll walk you through the rest of this method using data from a discrete choice experiment. This experiment investigated consumer preferences when purchasing a 12-pack of chicken eggs from a supermarket. Each of the 380 respondents were asked eight questions requiring them to choose between three alternative 12-packs of eggs.
The alternatives varied in their levels of seven attributes:
- egg weight (55g, 60g, 65g, or 70g)
- egg quality (caged, barn, or free range)
- price ($2, $3, $4, $5, or $6)
- chicken feed used (not stated, grain and fish, or vegetables only)
- egg uniformity (all eggs the same or some different)
- if the eggs were organic (not stated or antibiotic and hormone free)
- whether any proceeds from the sale went to charity (not stated or 10% donated to RSPCA).
I’ve provided an example question below:
Fitting your Choice Model
The eggs data is included in the package in multiple formats, so you can experiment and try out the different input formats accepted by FitChoiceModel.
To fit a choice model to the eggs data in Cho file format, we can use the following commands:
library(flipChoice)
cho.file <- system.file(“testdata”, “Training.cho”, package = “flipChoice”)
attribute.lvls.file <- system.file(“testdata”, “Attribute_labels_-_Training.xlsx”, package = “flipChoice”)
data(cho.ids, package = “flipChoice”) # vector of respondent IDs
fit <- FitChoiceModel(cho.file = cho.file,
attribute.levels.file = attribute.lvls.file,
respondent.ids = respondent.ids,
hb.iterations = 500, hb.chains = 2)
The function will issue a warning if it detects any issues with sampling but we should also check for convergence by examining the effect sample sizes and Rhat values for the mean and standard deviation parameters. These are included in the output of summary statistics using ExtractParameterStats(fit). We can examine trace plots for the same parameters using TracePlots(fit). Estimates of the posterior medians and 95% confidence bands for these parameters can be plotted using the function PlotPosteriorIntervals. My colleague, Justin, has discussed the diagnostics in more detail (including images) for a very similar model (MaxDiff) here.
We hope this post has helped you use Hierarchical Bayes to create your choice model. But if you’re feeling a bit confused and overwhelmed, it’s OK! Feel free to check out our guide on how to use Hierarchical Bayes to create a choice model in Displayr, it’s easier, we promise.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.