Site icon R-bloggers

The bms Function Explained

[This article was first published on BMS Add-ons » BMS Add-ons, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This text ‘pedagocially’ explains how the bms function works code-wise and is intended for people who prefer to program customized adjustments of the bms package.

bms is the workhorse function to do the sampling part of Bayesian Model Averaging in the BMS package. The bms function code in the package includes many different options, usability checks, and is tweaked for computational speed. Although it can be difficult to understand, it basically relies on object-oriented subfunctions that can be easily re-shuffled. In order to demonstrate this, the following paragraphs introduce code that replicate a very basic BMA sampling operation but stripped of all options and speed features.

A very basic bms function

The most easy-to-build version of BMA is to focus on a specific g-prior (here we use the BRIC prior by Fernández, Ley and Steel), and omit model priors.
To simply things further, let’s only use MCMC frequencies as an approximation to analytical likelihoods. Moreover, fix the start model of the chain at the null model because this needs the least code lines. A BMA sampling function to produced posterior inclusion probabilites and coefficients can then be sketched as in the example below. The comments indicate the use of subfunctions, mainly .fls.samp to sample candidate models, .ols.terms2 to calculate OLS quantities such as residual sum of squares, and .lprop.constg.init to calculate Bayesian posterior statistics for each model.

library(BMS)

basicbms = function(X.data, burn, iter) {
#'pedagogical' explanation of bms function from the BMS package:
# BMA based on MCMC frequencies with uniform model priors
# and a BRIC g-prior (i.e. fixed g=max(N,K^2)
# equivalent to bms(X.data, burn, iter, mprior="uniform",start.value=0)
#INPUTS: #######
#* X.data: data.frame
#* burn: number of burn-ins
#* iter: number of iterations on top of burn-ins
#CAUTION: This code is stripped of all other desirable features!
# For 'honest' bma, use the bms function from the BMS package!


# DATA PROCESSING ####
X.data = as.matrix(X.data) # tranfrom data.frame to matrix
N=nrow(X.data); K=ncol(X.data)-1 # get dimensions
# deman data (so we can omit the intercept)
y = X.data[,1]-matrix(mean(X.data[,1]),N,1,byrow=TRUE)
X = X.data[,-1]-matrix(colMeans(X.data[,-1]),N,ncol(X.data)-1,byrow=TRUE)
# do the matrix multiplication beforehand to save time:
yty=as.vector(crossprod(y))
XtX = crossprod(X)
Xty = crossprod(X,y)


# INITIALIZATION ####
 # init the number of accepted models after burn-in phase:
keptmodels=0

#init the vectors to add-up coefficients and posterior inclusion probabilities:
betavec=numeric(K); pipvec=numeric(K);

#init the object that computes posterior stats based on OLS results:
gobject = .lprob.constg.init(max(N,K^2),N,K,yty)

# our starting model here is the null model (for simplicity):
currentdraw=numeric(K)  #the null model does not include any of the K regressors
current.lprob = -(N-1)/2*log(yty) #log-likelihood of the null model
#note that usually and with the constants as in gobject, the null likelihood is (y'y)^(-(N-1)/2)

# SAMPLING CHAIN #####
for(i in 1:(burn+iter)) {

      #.fls.samp proposes a new model as in Fernandez, Ley and Steel (2001)
     candidraw=as.logical(.fls.samp(currentdraw,K)$mnewdraw)

     # .ols.terms2 computes the needed OLS quantities; Note that its use here is not very speedy
     olsr=.ols.terms2(which(candidraw), yty, sum(candidraw), N, K, XtX, Xty)$full.results()
     
     # gobject$lprob.all computes the log-likelihood of the candidate model
     postr = gobject$lprob.all(olsr$ymy,sum(candidraw),olsr$bhat,olsr$diag.inverse)
     
     # now check whether candidate model is accepted
     # by checking whether candiate_likelihood/current_likelihood > x with x ~ U(0,1)
     if (log(runif(1))< postr$lprob-current.lprob) {

        # if candidate is accepted than it becomes the new current model
         current.lprob = postr$lprob
         currentdraw=candidraw
         
         if (i>burn) {
            # if the burn-in stage is over, then aggregate the results
            keptmodels=keptmodels+1 #yes, we have one more accepted model
            pipvec = pipvec + currentdraw #add-up which covariates were included
            
            # addup the coeffiencts such that they fit into the right places
            betavec[as.logical(currentdraw)]= betavec[as.logical(currentdraw)] + as.vector(postr$b1new)
         }
     }
     
  }
  
  #make output: divide pipvec and betavec by keptmodels...
  # ...to get a frequency-based average
  output=cbind(pipvec,betavec)/keptmodels
  colnames(output)=c("PIP", "E(beta|y)")
  rownames(output) = colnames(X.data[,-1])
  
  return(output)
}

This function can be called via a command such as:

data(datafls)
bresult=basicbms(datafls, burn=5000, iter=10000)
print(bresult)

Note that the above command can of course be replicated with bms. Just type

bres2=bms(datafls, 5000, 10000, mprior="uniform",start.value=0) #sampling as above
pipbeta2=coef(bres2, order=F, exact=F)[,1:2] # extract PIPs and betas (MCMC-based)
plot(pipbeta[,1],bresult[,1]) # scatterplot of PIPs from bms vs. basicbms

The basic bms function with analytical likelihoods

Approximating Posterior model probabilites on MCMC frequencies is not very popular. Most authors prefer instead to base their posterior statistics on analytical likelihood expressions – which correspond to the term current.lprob used in the code above.
For that purpose, one must renounce on aggregating coefficients on the fly. Instead, one should retain all the accepted models encountered and do the aggregation based on their analytical PMPs after the sampling. For a huge numebr of sampling steps, this is computationally cumbersome or infeasible. Instead the usual approach is to store the best x models encountered that should cover most of the explored model space.

In order to achieve this, we use the topmod object (check the help with ?topmod). The previous code basically just needs to altered by to lines: Before the sampling loop the following is added to initialize the object

tmo = topmod(nbmodels=nmodel, nmaxreg=K, bbeta =TRUE)

In the model acceptance part (after the pipvec adding-up, the following line needs to be added:

tmo$addmodel(current.lprob, vec01=currentdraw, vbeta=as.vector(postr$b1new), vbeta2=as.vector(postr$b2new))

tmo then keeps track of the best nmodel models by likelihood. After the sampling chain, code needs to be altered somewhat to process the information. The entire, augmented function basicbms_wtop is in the code fragment below.

basicbms_wtop = function(X.data, burn, iter, nmodel=10) {
#'pedagogical' explanation of bms function from the BMS package:
# BMA based on best nmodel models with uniform model priors
# and a BRIC g-prior (i.e. fixed g=max(N,K^2)
# equivalent to bms(X.data, burn, iter, mprior="uniform",start.value=0)
#INPUTS: #######
#* X.data: data.frame
#* burn: number of burn-ins
#* iter: number of iterations on top of burn-ins
#CAUTION: This code is stripped of all other desirable features!
# For 'honest' bma, use the bms function from the BMS package!


# DATA PROCESSING ####
X.data = as.matrix(X.data) # tranfrom data.frame to matrix
N=nrow(X.data); K=ncol(X.data)-1 # get dimensions
# deman data (so we can omit the intercept)
y = X.data[,1]-matrix(mean(X.data[,1]),N,1,byrow=TRUE)
X = X.data[,-1]-matrix(colMeans(X.data[,-1]),N,ncol(X.data)-1,byrow=TRUE)
# do the matrix multiplication beforehand to save time:
yty=as.vector(crossprod(y))
XtX = crossprod(X)
Xty = crossprod(X,y)


# INITIALIZATION ####
 # init the number of accepted models after burn-in phase:
keptmodels=0

#init the vectors to add-up coefficients and posterior inclusion probabilities:
betavec=numeric(K); pipvec=numeric(K);

#init the object that computes posterior stats based on OLS results:
gobject = .lprob.constg.init(max(N,K^2),N,K,yty)

# our starting model here is the null model (for simplicity):
currentdraw=numeric(K)  #the null model does not include any of the K regressors
current.lprob = -(N-1)/2*log(yty) #log-likelihood of the null model
#note that usually and with the constants as in gobject, the null likelihood is (y'y)^(-(N-1)/2)

#set up the topmod object
tmo = topmod(nbmodels=nmodel, nmaxreg=K, bbeta =TRUE)

# SAMPLING CHAIN #####
for(i in 1:(burn+iter)) {

      #.fls.samp proposes a new model as in Fernandez, Ley and Steel (2001)
     candidraw=as.logical(.fls.samp(currentdraw,K)$mnewdraw)

     # .ols.terms2 computes the needed OLS quantities; Note that its use here is not very speedy
     olsr=.ols.terms2(which(candidraw), yty, sum(candidraw), N, K, XtX, Xty)$full.results()
     
     # gobject$lprob.all computes the log-likelihood of the candidate model
     postr = gobject$lprob.all(olsr$ymy,sum(candidraw),olsr$bhat,olsr$diag.inverse)
     
     # now check whether candidate model is accepted
     # by checking whether candiate_likelihood/current_likelihood > x with x ~ U(0,1)
     if (log(runif(1))< postr$lprob-current.lprob) {

        # if candidate is accepted than it becomes the new current model
         current.lprob = postr$lprob
         currentdraw=candidraw
         
         if (i>burn) {
            # if the burn-in stage is over, then aggregate the results
            keptmodels=keptmodels+1 #yes, we have one more accepted model
            pipvec = pipvec + currentdraw #add-up which covariates were included
            
            # addup the coeffiencts such that they fit into the right places
            betavec[as.logical(currentdraw)]= betavec[as.logical(currentdraw)] + as.vector(postr$b1new)
            
            #add the accepted model to the topmod object
            tmo$addmodel(current.lprob, vec01=currentdraw, vbeta=as.vector(postr$b1new), vbeta2=as.vector(postr$b2new))
         }
     }
  }
  
  #fake a bma object to use the coef.bma function
  bmao=list(); bmao$topmod =tmo; class(bmao)="bma";
  printoutput=coef.bma(bmao,exact=TRUE, order=FALSE)
  rownames(printoutput)=colnames(X.data[,-1])
  print(printoutput)
  return(invisible(bmao))
}

A call to basicbms_wtop now produces output basic on exact analytical likelihoods:

bres3= basicbms_wtop(datafls, 5000, 10000, nmodel=100)

The result prints output based on the best 10 models encountered by the sampling chain. (The topmod object can be further inspected by the command print(bres3)

Equivalence to Koop Code

Note that the building blocks above are enough to reproduce the Matlab code chapter11.m by Gary Koop. His code uses both frequency- and analytical-based inference, and starts from a more custom staring model. Therefore the function basicbms_wtop needs to be slightly adjusted to produce basicbms_koop:

basicbms_koop = function(X.data, burn=100, iter=1000, nmodel=10) {
#'pedagogical' explanation of bms function from the BMS package:
# code equivalent to Gary Koop's:
# http://www.wiley.com/legacy/wileychi/koopbayesian/supp/chapter11.m
# equivalent to bms(X.data, burn, iter, mprior="uniform",start.value=0, nmodel=10)
#INPUTS: #######
#* X.data: data.frame
#* burn: number of burn-ins
#* iter: number of iterations on top of burn-ins
#CAUTION: This code is stripped of all other desirable features!
# For 'honest' bma, use the bms function from the BMS package!


# DATA PROCESSING ####
X.data = as.matrix(X.data) # tranform data.frame to matrix
N=nrow(X.data); K=ncol(X.data)-1 # get dimensions
# deman data (so we can omit the intercept)
y = X.data[,1]-matrix(mean(X.data[,1]),N,1,byrow=TRUE)
X = X.data[,-1]-matrix(colMeans(X.data[,-1]),N,ncol(X.data)-1,byrow=TRUE)
# do the matrix multiplication beforehand to save time:
yty=as.vector(crossprod(y))
XtX = crossprod(X)
Xty = crossprod(X,y)


# INITIALIZATION ####
 # init the number of accepted models after burn-in phase:
keptmodels=0

#init the vectors to add-up coefficients (1st & 2nd moments) and posterior inclusion probabilities:
betavec=numeric(K); betavec2=numeric(K); pipvec=numeric(K);

#init the object that computes posterior stats based on OLS results:
gobject = .lprob.constg.init(max(N,K^2),N,K,yty)



# our starting model here is as in Koop
# the one containing all OLS regressors with a t-stat>0.5
# the .starter function does that for us
currentdraw=as.logical(.starter(K,K,y,N, XtX, Xty, X)$molddraw)
# next three lines are to compute likelihoood for starter model
olsr=.ols.terms2(which(currentdraw), yty, sum(currentdraw), N, K, XtX, Xty)$full.results()
postr = gobject$lprob.all(olsr$ymy,sum(currentdraw),olsr$bhat,olsr$diag.inverse)
current.lprob = postr$lprob

#set up the topmod object
tmo = topmod(nbmodels=nmodel, nmaxreg=K, bbeta =TRUE)

# SAMPLING CHAIN #####
for(i in 1:(burn+iter)) {

      #.fls.samp proposes a new model as in Fernandez, Ley and Steel (2001)
     candidraw=as.logical(.fls.samp(currentdraw,K)$mnewdraw)

     # .ols.terms2 computes the needed OLS quantities; Note that its use here is not very speedy
     olsr=.ols.terms2(which(candidraw), yty, sum(candidraw), N, K, XtX, Xty)$full.results()

     # gobject$lprob.all computes the log-likelihood of the candidate model
     postr = gobject$lprob.all(olsr$ymy,sum(candidraw),olsr$bhat,olsr$diag.inverse)

     # now check whether candidate model is accepted
     # by checking whether candiate_likelihood/current_likelihood > x with x ~ U(0,1)
     if (log(runif(1))< postr$lprob-current.lprob) {

        # if candidate is accepted than it becomes the new current model
         current.lprob = postr$lprob
         currentdraw=candidraw

         if (i>burn) {
            # if the burn-in stage is over, then aggregate the results
            keptmodels=keptmodels+1 #yes, we have one more accepted model
            pipvec = pipvec + currentdraw #add-up which covariates were included

            # addup the coeffiencts such that they fit into the right places
            betavec[as.logical(currentdraw)]= betavec[as.logical(currentdraw)] + as.vector(postr$b1new)

            # addup the coeffienct second moments in a similar way
            betavec2[as.logical(currentdraw)]= betavec2[as.logical(currentdraw)] + as.vector(postr$b2new)

            #add the accepted model to the topmod object
            tmo$addmodel(current.lprob, vec01=currentdraw, vbeta=as.vector(postr$b1new), vbeta2=as.vector(postr$b2new))
         }
     }
  }


  #### now print stuff like Koop does ######
  printoutput=matrix(pipvec/keptmodels,K,1);
  rownames(printoutput)=colnames(X.data[,-1])
  cat("\n proportion of models visited containg each variable (PIP)\n")
  print(printoutput)
  cat("\n mean nb of regressors in models\n")
  print(sum(pipvec)/keptmodels) # using that post model size = sum(PIP)
  cat(paste("\nprob of best", nmodel,"models (after deleteing rest of models)\n"))
  printoutput = pmp.bma(tmo)
  colnames(printoutput)=c("Analytical", "Numerical")
  print(printoutput)
  cat("\nPosterior mean and stand dev of each coefficient\n")
  printoutput = cbind(betavec/keptmodels, sqrt(betavec2/keptmodels-(betavec/keptmodels)^2))
  rownames(printoutput)=colnames(X.data[,-1])
  colnames(printoutput)=c("E(beta|Y)", "SD(beta|Y)")
  print(printoutput)
}

Calling basicbms_koop now prodcues the same output as his matlab function. Note that the actual numerical results can differ a lot, as the very few iterations Kopp suggests are not a good approximation. Moreover, note that Koop’s dataset is differently ordered from datafls, therefore necessitating a re-ordering.

datakoop = datafls[,c(1, 11:16, 36:41, 9, 42, 10, 7, 8, 31:33, 6, 34, 35, 2, 17, 5, 18:21, 4, 22:27, 3, 28:30)]
basicbms_koop(datakoop)

Note the Koop output function can be easily reproduced with basic BMS commands as well:

 bmao=bms(datakoop, burn=100, iter=1000, mprior="uniform", nmodel=10, user.int=F)
  print("proportion of models visited containg each variable (PIP)")
  print(coef(bmao,order=F)[,1])
  print(summary(bmao)[1])
  print("prob of best models (after deleting rest of models)")
  print(pmp.bma(bmao$topmod))
  print("Posterior mean and stand dev of each coefficient")
  print(coef(bmao,order=F)[,2:3])

Conclusion

The above examples show how the building blocks of BMS might be used to construct basic BMA sampling functions. In particular vital are the functions topmod, .ols.terms2 as well as the sampling functions (here .fls.samp) and the likelihood functions (here .lprob.constg.init). For more about these functions check out their source code in the BMS.RData which includes code comments.

To leave a comment for the author, please follow the link and comment on their blog: BMS Add-ons » BMS Add-ons.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.