Bayes models from SAS PROC MIXED in R, post 2
[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This is my second post in converting SAS’s PROC MCMC examples in R. The task in his week is determining the transformation parameter in a Box-Cox transformation. SAS only determines Lambda, but I am not so sure about that. What I used to do was get an interval for Lambda, select an interpretable value (e.g. 1/2 is square root, -1 is inverse) and progress using this parameter. However, if that is my approach in a Bayesian context, maybe I should make my prior on lambda only those values I find acceptable. On the other hand, if I model lambda continuous, should not the uncertainty in Lambda also effect the uncertainty in the remainder of the model? In this post I will consider them all continuous. Hence the first step is to get SAS to give me those values, for which I added those parameters in the code and ran it through SAS Studio. Subsequently find the code for the various R approaches. For readability I stuck the code next to text, while output is at the bottom of the post. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Note on previous post
Last week I failed to get some samplers. I now realize that some of these might have been solved with the ones trick or the zero trick. In addition, LaplacesDemon has dnormm() which is more elegant than my approach.SAS
This is just the model from the SAS manual with extra parameters monitored.
proc mcmc data=boxcox nmc=50000 propcov=quanew seed=12567 monitor=(lda beta0 beta1 sd);ods select PostSumInt TADpanel;
parms beta0 0 beta1 0 lda 1 s2 1;
beginnodata;
prior beta: ~ general(0);
prior s2 ~ gamma(shape=3, scale=2);
prior lda ~ unif(-2, 2);
sd=sqrt(s2);
endnodata;
ys=(y**lda-1)/lda;
mu=beta0+beta1*x;
ll=(lda-1)*log(y)+lpdfnorm(ys, mu, sd);
model general(ll);
run;
STAN
Rstan has the Jacobian needed for this exercise. It was fairly straightforward to fit it all together.library(plyr)
library(dplyr)
# stan
library(rstan)
datain <- list(n=100,x=set2$x,y=set2$y)
parameters <- c('lda','beta0','beta1','std')
fit <- '
data {
int
vector[n] y;
vector[n] x;
}
parameters {
real
real beta0;
real beta1;
real
}
transformed parameters {
vector[n] ys;
vector[n] ll;
vector[n] mu;
real
std <- sqrt(s2);
for (i in 1:n) {
ys[i] <- (pow(y[i],lda)-1)/lda;
}
mu <- beta0+beta1*x;
ll <- (lda-1)*log(y);
}
model {
s2 ~ gamma(3, 0.5);
increment_log_prob(ll);
increment_log_prob(normal_log(ys,mu,std));
}
‘ %>%
stan(model_code = .,
data = datain,
pars=parameters)
fit
JAGS
The JAGS code is not straightforward for two reasons.The first reason is that Jags does not have the concept of Jacobian. However, upon some investigation I found the zeros trick. Which basically means that an arbitrary contribution to the deviance might be added by transforming that deviance and pretending to JAGS the values are from a Poisson distribution.
The second reason is that JAGS has directed acyclic graphs. At least, I think that is the reason. One cannot first calculate transformed values and then make them dependent on a distribution. My solution was to subtract transformed and estimated values and make the result normal around 0.
To get acceptable results in terms of n.eff and Rhat I increased the number of samples to 10000 per chain.
library(R2jags)
datain <- list(n=100,
x=set2$x,
y=set2$y,
zero1 = rep(0,100),
zero2 = rep(0,100))
parameters <- c('LDA','beta0','beta1','std')
jmodel <- function() {
tau <- 1/s2
std <- sqrt(s2)
for (i in 1:n) {
mu[i] <- beta0+x[i]*beta1
ys[i] <- (pow(y[i],LDA)-1)/LDA -mu[i]
zero1[i] ~ dnorm(ys[i],tau)
ll[i] <- -((LDA-1)*log(y[i]))+100
zero2[i] ~ dpois(ll[i])
}
beta0 ~ dnorm(0,.0001)
beta1 ~ dnorm(0,.0001)
s2 ~ dgamma(3,.5)
LDA ~ dunif(-2,2)
}
inits <- function() {
list(beta0=rnorm(1),
beta1=rnorm(1),
LDA=runif(1,-1,1),
s2=runif(1,.1,3))
}
jj <- jags(model.file=jmodel,
data=datain,
inits=inits,
n.iter=10000,
parameters=parameters)
jj
MCMC
As such it is relatively easy to build the model for MCMC. Based on the demo vignette I was also able to set scale to obtain an acceptance rate about 0.2. After some additional calculations I came to batch length of 10000, and chose for 20 batches to keep computation time somewhat limited. All in all it took quite some attempts to get to these settings in terms of nbatch and batch length. MCMC only gives summary statistics based on each separate batch. These are subsequently summarized. The results seem a bit different from SAS and JAGS. In addition a small tweak was needed to get SD as output rather than internal variable S2.mydens <- function(x,...) {
if (x[4]<0 ) return(-Inf)
std <- sqrt(x[4])
mu <- x[2]+set2$x*x[3]
ys <- (set2$y^x[1]-1)/x[1]
ll <- (x[1]-1)*log(set2$y)
sum(dnorm(ys,mu,std,log=TRUE)+ll) +dgamma(x[4],shape=3,scale=2)
}
sam <- metrop(mydens,
initial=c(.46,-2,1.8,1.3),
nbatch=20,
blen=10000,
scale=.025,
outfun=function(z, …) {
z[4] <- sqrt(z[4])
c(z, z^2)}
)
sam$accept
foo <- apply(sam$batch, 2, mean)
mu <- foo[1:4]
sigmasq <- foo[5:8] - mu^2
data.frame(param=c(‘LDA’,’beta0′,’beta1′,’SD’),mu,sigmasq)
MCMCpack
Obviously, the deviance code is same as MCMC. Based on the ACF a thinning of 20 was chosen. I did not see an obvious reason to change from the default 20000 samples.library(MCMCpack)
library(coda)
mydens <- function(x,...) {
if (x[4]<0 ) return(-Inf)
std <- sqrt(x[4])
mu <- x[2]+set2$x*x[3]
ys <- (set2$y^x[1]-1)/x[1]
ll <- (x[1]-1)*log(set2$y)
sum(dnorm(ys,mu,std,log=TRUE)+ll) +dgamma(x[4],shape=3,scale=2)
}
sam <- MCMCmetrop1R(mydens,
theta.init=c(.46,-2,1.8,1.3),
tune=1.5,thin=20)
summary(sam)
rcppbugs
Rcppbugs has many of the same limits as JAGS in terms of available distributions. The same tricks as used in JAGS are hence used in rcppbugs, with one exception, where JAGS has a Poison distribution, rcppbugs lacks it, hence the ones trick is used, which is the equivalent to zeros trick, but using Bernoulli distribution. The ACF gave quite a big lag, hence I upped both the number of samples and increased thinning.library(rcppbugs)
x <- set2$x
y <- set2$y
one <- rep(1,100)
yzero <- rep(0,100)
beta <- mcmc.normal(c(.1,.1),mu=0,tau=0.0001)
LDA <- mcmc.uniform(.5,-2,2)
tau.y <- mcmc.gamma(sd(as.vector(y)),alpha=0.1,beta=0.1)
y.hat <- deterministic(function(x,beta,y,LDA) {
ypred=beta[1]+beta[2]*x
ytrans=(y^LDA-1)/LDA
ypred-ytrans
}, x, beta,y,LDA)
Jac <- deterministic(function(LDA,y)
exp((LDA-1)*log(y))/100,LDA,y)
cJac <- mcmc.bernoulli(one,Jac,observed=TRUE)
y.lik <- mcmc.normal(yzero,mu=y.hat,tau=tau.y,observed=TRUE)
m <- create.model(beta,LDA, tau.y,
y.hat,
y.lik,
Jac,
cJac)
ans <- run.model(m, iterations=1e6L, burn=1e4L, adapt=1e3L, thin=1000L)
mysum <- function(x) c(mean=mean(x),sd=sd(x),quantile(x,c(.025,.5,.975)))
rbind(LDA=mysum(ans$LDA),
beta0=mysum(ans$b[,1]),
beta1=mysum(ans$b[,2]),
sd=mysum(1/sqrt(ans$tau.y)))
LaplacesDemon
LaplacesDemon also needed a lot of samples. At a later attempt I tried the HARM sampler which worked much better. Getting the right sampler is important.library(‘LaplacesDemon’)
mon.names <- "LP"
parm.names <- c('beta0','beta1','LDA','std')
PGF <- function(Data) c(rnorm(2,0,1),runif(1,.4,.5),runif(1,.5,2))
MyData <- list(mon.names=mon.names,
parm.names=parm.names,
PGF=PGF,
y=set2$y,
x=set2$x)
Model <- function(parm, Data)
{
### Parameters
std <- interval(parm[4], 1e-100, Inf)
parm[4] <- std
s2 <- parm[4]^2
mu <- parm[1]+Data$x*parm[2]
ys <- (Data$y^parm[3]-1)/parm[3]
ll <- (parm[3]-1)*log(Data$y)
LL <- sum(dnorm(ys,mu,std,log=TRUE)+ll)
LP <- LL+dgamma(s2,shape=3,scale=2)
Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LP,
yhat=1,
parm=parm)
return(Modelout)
}
# very dependent on initial values
Initial.Values <- GIV(Model, MyData, PGF=TRUE)
Fit <- LaplacesDemon(Model,
Data=MyData,
Algorithm=’HARM’,
Specs=list(alpha.star=0.234, B=NULL),
Initial.Values = Initial.Values,
Iterations=40000,Status=2000,Thinning=30
)
Fit
Output
SAS
Standard
Parameter N Mean Deviation 95% HPD Interval
lda 50000 0.4701 0.0290 0.4142 0.5281
beta0 50000 -2.2064 0.4215 -3.0564 -1.4201
beta1 50000 1.9260 0.1806 1.5854 2.2826
sd 50000 1.3275 0.1482 1.0648 1.6328
STAN
Inference for Stan model: __LHS.4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
lda 0.47 0.00 0.03 0.41 0.45 0.47 0.49 0.53 595 1.01
beta0 -2.20 0.02 0.40 -3.01 -2.45 -2.18 -1.94 -1.44 684 1.01
beta1 1.92 0.01 0.17 1.59 1.81 1.91 2.02 2.28 571 1.01
std 1.32 0.01 0.15 1.06 1.22 1.31 1.42 1.65 691 1.00
lp__ -309.43 0.05 1.45 -313.02 -310.13 -309.09 -308.36 -307.64 977 1.00
Samples were drawn using NUTS(diag_e) at Sat Oct 4 15:14:25 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
JAGS
Inference for Bugs model at “/tmp/RtmpSgjWNR/modelef330519173.txt”, fit using jags,
3 chains, each with 10000 iterations (first 5000 discarded), n.thin = 5
n.sims = 3000 iterations saved
mu.vect sd.vect 2.5% 25% 50% 75% 97.5%
LDA 0.472 0.030 0.413 0.451 0.472 0.491 0.535
beta0 -2.229 0.443 -3.200 -2.512 -2.199 -1.920 -1.450
beta1 1.938 0.192 1.608 1.805 1.925 2.054 2.356
std 1.336 0.155 1.079 1.227 1.319 1.433 1.680
deviance 20620.595 3.365 20616.435 20618.142 20619.763 20622.197 20629.012
Rhat n.eff
LDA 1.009 2900
beta0 1.003 1400
beta1 1.008 1500
std 1.002 3000
deviance 1.011 480
For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
DIC info (using the rule, pD = var(deviance)/2)
pD = 5.6 and DIC = 20626.2
DIC is an estimate of expected predictive error (lower deviance is better).
MCMC
param mu sigmasq
1 LDA 0.4614172 0.0007537545
2 beta0 -2.0808097 0.1414263712
3 beta1 1.8693639 0.0267559710
4 SD 1.2832247 0.0180115043
MCMCpack
Note that MCMCpack has s2 rather than sd in outputIterations = 501:20481
Thinning interval = 20
Number of chains = 1
Sample size per chain = 1000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
[1,] 0.4657 0.02927 0.0009256 0.0009256
[2,] -2.1565 0.40274 0.0127358 0.0113576
[3,] 1.8988 0.17764 0.0056175 0.0052342
[4,] 1.7048 0.36930 0.0116782 0.0133882
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
var1 0.4058 0.4479 0.466 0.4854 0.5215
var2 -2.9944 -2.4064 -2.146 -1.8799 -1.4346
var3 1.5606 1.7810 1.890 2.0099 2.2647
var4 1.1204 1.4473 1.652 1.9261 2.5331
rcppbugs
mean sd 2.5% 50% 97.5%
LDA 0.4597165 0.02875052 0.4050122 0.4592264 0.5207108
beta0 -2.0937414 0.39740763 -2.9511520 -2.0668905 -1.4015904
beta1 1.8647679 0.17411536 1.5640795 1.8566258 2.2509009
sd 1.2612126 0.14011464 1.0228554 1.2502009 1.5634482
LaplacesDemon
Call:LaplacesDemon(Model = Model, Data = MyData, Initial.Values = Initial.Values,
Iterations = 40000, Status = 2000, Thinning = 30, Algorithm = “HARM”,
Specs = list(alpha.star = 0.234, B = NULL))
Acceptance Rate: 0.22825
Algorithm: Hit-And-Run Metropolis
Covariance Matrix: (NOT SHOWN HERE; diagonal shown instead)
beta0 beta1 LDA std
0.1329756466 0.0283948376 0.0007961449 0.0174295808
Covariance (Diagonal) History: (NOT SHOWN HERE)
Deviance Information Criterion (DIC):
All Stationary
Dbar 619.952 619.768
pD 9.531 3.530
DIC 629.483 623.298
Initial Values:
[1] -0.2820944 0.3190142 0.4778030 1.3380326
Iterations: 40000
Log(Marginal Likelihood): -313.4286
Minutes of run-time: 0.12
Model: (NOT SHOWN HERE)
Monitor: (NOT SHOWN HERE)
Parameters (Number of): 4
Posterior1: (NOT SHOWN HERE)
Posterior2: (NOT SHOWN HERE)
Recommended Burn-In of Thinned Samples: 532
Recommended Burn-In of Un-thinned Samples: 15960
Recommended Thinning: 31
Specs: (NOT SHOWN HERE)
Status is displayed every 2000 iterations
Summary1: (SHOWN BELOW)
Summary2: (SHOWN BELOW)
Thinned Samples: 1333
Thinning: 30
Summary of All Samples
Mean SD MCSE ESS LB Median
beta0 -2.0923727 0.3614098 0.091343057 17.08625 -2.8552455 -2.0861434
beta1 1.8711745 0.1631220 0.037952180 22.13164 1.5969508 1.8620763
LDA 0.4612769 0.0282230 0.004279672 34.00465 0.4086439 0.4616171
std 1.2653917 0.1320557 0.020805981 98.82246 1.0352842 1.2574597
Deviance 619.9523160 4.3660058 0.415674235 188.93047 616.3618665 619.2464350
LP -309.9042345 2.1778219 0.206890192 190.00073 -313.2903260 -309.5529196
UB
beta0 -1.4732055
beta1 2.2047884
LDA 0.5171681
std 1.5495557
Deviance 626.7398118
LP -308.1117717
Summary of Stationary Samples
Mean SD MCSE ESS LB
beta0 -2.0498690 0.36586611 0.118090993 8.836406 -2.9348999
beta1 1.8509120 0.16921671 0.050214255 15.798638 1.5936363
LDA 0.4577846 0.02886563 0.006775378 21.788913 0.4078682
std 1.2449204 0.12994746 0.027000671 55.838883 1.0332563
Deviance 619.7681778 2.65715568 0.326080727 121.032423 616.3311632
LP -309.8148748 1.32309998 0.206890192 123.122008 -313.0799005
Median UB
beta0 -1.9902222 -1.5178152
beta1 1.8236876 2.2326418
LDA 0.4558953 0.5206561
std 1.2320277 1.5344502
Deviance 619.3065650 626.3929182
LP -309.5806926 -308.1020531
Appendix: data used
s1 <- scan(what=list(double(),double()),text='
10.0 3.0 72.6 8.3 59.7 8.1 20.1 4.8 90.1 9.8 1.1 0.9
78.2 8.5 87.4 9.0 9.5 3.4 0.1 1.4 0.1 1.1 42.5 5.1
57.0 7.5 9.9 1.9 0.5 1.0 121.1 9.9 37.5 5.9 49.5 6.7
8.3 1.8 0.6 1.8 53.0 6.7 112.8 10.0 40.7 6.4 5.1 2.4
73.3 9.5 122.4 9.9 87.2 9.4 121.2 9.9 23.1 4.3 7.1 3.5
12.4 3.3 5.6 2.7 113.0 9.6 110.5 10.0 3.1 1.5 52.4 7.9
80.4 8.1 0.6 1.6 115.1 9.1 15.9 3.1 56.5 7.3 85.4 9.8
32.5 5.8 43.0 6.2 0.1 0.8 21.8 5.2 15.2 3.5 5.2 3.0
0.2 0.8 73.5 8.2 4.9 3.2 0.2 0.3 69.0 9.2 3.6 3.5
0.2 0.9 101.3 9.9 10.0 3.7 16.9 3.0 11.2 5.0 0.2 0.4
80.8 9.4 24.9 5.7 113.5 9.7 6.2 2.1 12.5 3.2 4.8 1.8
80.1 8.3 26.4 4.8 13.4 3.8 99.8 9.7 44.1 6.2 15.3 3.8
2.2 1.5 10.3 2.7 13.8 4.7 38.6 4.5 79.1 9.8 33.6 5.8
9.1 4.5 89.3 9.1 5.5 2.6 20.0 4.8 2.9 2.9 82.9 8.4
7.0 3.5 14.5 2.9 16.0 3.7 29.3 6.1 48.9 6.3 1.6 1.9
34.7 6.2 33.5 6.5 26.0 5.6 12.7 3.1 0.1 0.3 15.4 4.2
2.6 1.8 58.6 7.9 81.2 8.1 37.2 6.9
‘)
set2 <- data.frame(x=s1[[2]],y=s1[[1]])
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.