SAS PROC MCMC example in R; Poisson Regression
[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In this post I will try to copy the calculations of SAS’s PROC MCMC example 61.5 (Poisson Regression) into the various R solutions. In this post Jags, RStan, MCMCpack, LaplacesDemon solutions are shown. Compared to the first post in this series, rcppbugs and mcmc are not used. Rcppbugs has no poisson distribution and while I know how to go around that (ones trick, second post in series) I just don’t think that should be the approach for a standard distribution such as poisson. Mcmc has a weird summary which I dislike.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Data and model
Data are insurance claims. To quote SAS manual ‘The variable n represents the number of insurance policy holders and the variable c represents the number of insurance claims. The variable car is the type of car involved (classified into three groups), and it is coded into two levels. The variable age is the age group of a policy holder (classified into two groups)‘.There is a nice trick in the model, again to quote; ‘The logarithm of the variable n is used as an offset—that is, a regression variable with a constant coefficient of 1 for each observation. Having the offset constant in the model is equivalent to fitting an expanded data set with 3000 observations, each with response variable y observed on an individual level. The log link relates the mean and the factors car and age‘.
There was an nice SAS trick in the data. The raw data has car as one variable and a design matrix is needed, for which PROC TRANSREG is used. In R using model.matrix() is at least a bit more transparant.
insure <- read.table(text='
n c car age
500 42 small 0
1200 37 medium 0
100 1 large 0
400 101 small 1
500 73 medium 1
300 14 large 1′,
skip=1,header=TRUE)
insure$ln=log(insure$n)
insure$car <- relevel(insure$car,'small')
input_insure <- model.matrix(~ car + age,data=insure)
Model
The models are all straightforward implementations of the SAS code.MCMCpack
Probably the shortest code is for MCMCpack.library(MCMCpack)
MCMCfun <- function(parm) {
mu <- input_insure %*% parm
LL <- sum(dnorm(parm,0,1e3,log=TRUE))
yhat <- exp(mu+insure$ln)
LP <- LL+sum(dpois(insure$c,yhat,log=TRUE))
return(LP)
}
mcmcout <- MCMCmetrop1R(MCMCfun,rep(0,4))
summary(mcmcout)
LaplacesDemon
LaplacesDemon has a bit more code around essentially the same likelihood code as MCMCpack. In addition one needs to chose the sampling regime. I chose RWM with a first run providing values for parameter variances/covariances as a input for the final run. Of the four R based methods, LaplacesDemon() is the function which creates most output.library(‘LaplacesDemon’)
mon.names <- "LP"
parm.names <- c('alpha','beta_car1','beta_car2','beta_age')
PGF <- function(Data) c(rnorm(4,0,1))
MyData <- list(mon.names=mon.names,
parm.names=parm.names,
PGF=PGF,
y=insure$c,
x=input_insure,
ln=insure$ln)
Model <- function(parm, Data) {
mu <- Data$x %*% parm
LL <- sum(dnorm(parm,0,1e3,log=TRUE))
yhat <- exp(mu+Data$ln)
LP <- LL+sum(dpois(Data$y,yhat,log=TRUE))
Modelout <- list(LP=LP, Dev=-2*LL, Monitor=LP,
yhat=yhat,
parm=parm)
return(Modelout)
}
Initial.Values <- GIV(Model, MyData, PGF=TRUE)
Fit1 <- LaplacesDemon(Model,
Data=MyData,
Algorithm=’RWM’,
Covar=rep(.1,4),
Initial.Values = Initial.Values
)
Fit2 <- LaplacesDemon(Model,
Data=MyData,
Algorithm=’RWM’,
Covar=var(Fit1$Posterior2),
Initial.Values = apply(Fit1$Posterior2,2,median)
)
Fit2
JAGS
The code is so simple, no comments are needed.library(R2jags)
datain <- list(
car1=input_insure[,2],
car2=input_insure[,3],
agev=insure$age,
ln=insure$ln,
c=insure$c,
n=nrow(insure))
parameters <- c('a0','c1','c2','age')
jmodel <- function() {
for (i in 1:n) {
mu[i] <- a0+car1[i]*c1+car2[i]*c2+agev[i]*age
y[i] <- exp(mu[i]+ln[i])
c[i] ~ dpois(y[i])
}
a0 ~ dnorm(0,.0001)
c1 ~ dnorm(0,.0001)
c2 ~ dnorm(0,.0001)
age ~ dnorm(0,.0001)
}
jj <- jags(model.file=jmodel,
data=datain,
parameters=parameters)
jj
Stan
The code is again straightforward. As in previous posts, datain and parameters from Jags are re-used. Since I am still struggling a bit with Rstan I am using dplyr to combine model and simulation in one function. Stan is the slowest of all R based methods, the overhead of compiling is relative large for such a simple modellibrary(dplyr)
library(rstan)
fit <- '
data {
int
int c[n];
vector[n] agev;
vector[n] ln;
vector[n] car1;
vector[n] car2;
}
parameters {
real a0;
real c1;
real c2;
real age;
}
transformed parameters {
vector[n] y;
vector[n] mu;
mu <- a0+car1*c1+car2*c2+agev*age;
y <- exp(mu+ln);
}
model {
c ~ poisson(y);
a0 ~ normal(0,100);
c1 ~ normal(0,100);
c2 ~ normal(0,100);
age ~ normal(0,100);
}
‘ %>%
stan(model_code = .,
data = datain,
pars=parameters)
fit
Results
MCMCpack
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@The Metropolis acceptance rate was 0.37195
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> summary(mcmcout)
Iterations = 501:20500
Thinning interval = 1
Number of chains = 1
Sample size per chain = 20000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
[1,] -2.643 0.1304 0.0009223 0.003448
[2,] -1.787 0.2736 0.0019350 0.007368
[3,] -0.696 0.1273 0.0008999 0.003315
[4,] 1.327 0.1335 0.0009440 0.003497
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
var1 -2.9082 -2.7309 -2.6388 -2.5552 -2.3911
var2 -2.3606 -1.9639 -1.7772 -1.5944 -1.2809
var3 -0.9456 -0.7789 -0.6962 -0.6091 -0.4504
var4 1.0712 1.2363 1.3268 1.4188 1.5914
LaplacesDemon
Call:LaplacesDemon(Model = Model, Data = MyData, Initial.Values = apply(Fit1$Posterior2, 2, median), Covar = var(Fit1$Posterior2), Algorithm = “RWM”)
Acceptance Rate: 0.3294
Algorithm: Random-Walk Metropolis
Covariance Matrix: (NOT SHOWN HERE; diagonal shown instead)
alpha beta_car1 beta_car2 beta_age
0.02421653 0.09052011 0.01687834 0.02480825
Covariance (Diagonal) History: (NOT SHOWN HERE)
Deviance Information Criterion (DIC):
All Stationary
Dbar 62.614 62.614
pD 0.000 0.000
DIC 62.614 62.614
Initial Values:
alpha beta_car1 beta_car2 beta_age
-2.6482937 -1.7698730 -0.6848875 1.3175961
Iterations: 10000
Log(Marginal Likelihood): -33.58429
Minutes of run-time: 0.03
Model: (NOT SHOWN HERE)
Monitor: (NOT SHOWN HERE)
Parameters (Number of): 4
Posterior1: (NOT SHOWN HERE)
Posterior2: (NOT SHOWN HERE)
Recommended Burn-In of Thinned Samples: 0
Recommended Burn-In of Un-thinned Samples: 0
Recommended Thinning: 30
Specs: (NOT SHOWN HERE)
Status is displayed every 100 iterations
Summary1: (SHOWN BELOW)
Summary2: (SHOWN BELOW)
Thinned Samples: 1000
Thinning: 10
Summary of All Samples
Mean SD MCSE ESS LB Median
alpha -2.6488552 1.342421e-01 6.565916e-03 582.9301 -2.9160940 -2.650964
beta_car1 -1.8137509 2.822791e-01 1.474688e-02 508.9384 -2.3852812 -1.815035
beta_car2 -0.6856638 1.283016e-01 6.234377e-03 701.7594 -0.9401333 -0.684194
beta_age 1.3241569 1.353041e-01 6.180607e-03 752.9705 1.0582629 1.324364
Deviance 62.6135632 1.379071e-06 6.585127e-08 643.3371 62.6135607 62.613563
LP -49.8707670 1.445016e+00 6.295406e-02 648.5900 -53.5791639 -49.554267
UB
alpha -2.4029950
beta_car1 -1.2906070
beta_car2 -0.4358381
beta_age 1.5855214
Deviance 62.6135661
LP -48.0070303
Summary of Stationary Samples
Mean SD MCSE ESS LB Median
alpha -2.6488552 1.342421e-01 6.565916e-03 582.9301 -2.9160940 -2.650964
beta_car1 -1.8137509 2.822791e-01 1.474688e-02 508.9384 -2.3852812 -1.815035
beta_car2 -0.6856638 1.283016e-01 6.234377e-03 701.7594 -0.9401333 -0.684194
beta_age 1.3241569 1.353041e-01 6.180607e-03 752.9705 1.0582629 1.324364
Deviance 62.6135632 1.379071e-06 6.585127e-08 643.3371 62.6135607 62.613563
LP -49.8707670 1.445016e+00 6.295406e-02 648.5900 -53.5791639 -49.554267
UB
alpha -2.4029950
beta_car1 -1.2906070
beta_car2 -0.4358381
beta_age 1.5855214
Deviance 62.6135661
LP -48.0070303
Jags
Inference for Bugs model at “/tmp/RtmpgHy1qo/modelc8f275c94f1.txt”, fit using jags,3 chains, each with 2000 iterations (first 1000 discarded)
n.sims = 3000 iterations saved
mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff
a0 -2.649 0.131 -2.908 -2.737 -2.647 -2.559 -2.394 1.001 2700
age 1.327 0.136 1.063 1.235 1.323 1.419 1.588 1.001 3000
c1 -1.795 0.280 -2.393 -1.976 -1.787 -1.595 -1.295 1.003 990
c2 -0.696 0.128 -0.948 -0.783 -0.695 -0.610 -0.436 1.002 1400
deviance 36.929 2.817 33.407 34.867 36.332 38.308 43.995 1.001 3000
For each parameter, n.eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
DIC info (using the rule, pD = var(deviance)/2)
pD = 4.0 and DIC = 40.9
DIC is an estimate of expected predictive error (lower deviance is better).
Stan
Inference for Stan model: __LHS.4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
a0 -2.65 0.00 0.13 -2.93 -2.74 -2.64 -2.56 -2.40 1248 1
c1 -1.79 0.01 0.28 -2.34 -1.98 -1.78 -1.60 -1.27 1706 1
c2 -0.69 0.00 0.13 -0.94 -0.78 -0.69 -0.61 -0.44 1859 1
age 1.33 0.00 0.14 1.06 1.23 1.32 1.42 1.60 1411 1
lp__ 835.44 0.04 1.43 831.88 834.75 835.76 836.50 837.20 1294 1
Samples were drawn using NUTS(diag_e) at Sat Nov 15 14:15:39 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.