Mixed distribution

Posted on October 28, 2012 by Wingfeet in Uncategorized | 0 Comments

[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the football data there was some reason to use a mixed distribution in the football data (ref) so I tried doing that. It was more difficult than I expected. Not only is a mixture of distributions fairly difficult, also the system was over parameterized, causing grief.

mixed distribution

data from a mixed distribution is simply said data which is from a combination of distributions. Most obvious example is length of people; males are longer than females. Combine their lengths and a joint non normal distribution appears. Separate them, two normal distributions. This problem is equivalent to the eyes problem in winbugs. Except that I use JAGS and it was supposed to be part of a larger model. The example section in JAGS shows two ways how to do this. In the end I chose to use the dnormmix distribution in the mix module. To quote the JAGS manual: ‘The mix module defines a novel distribution dnormmix(mu,tau,pi) representing a finite mixture of normal distributions‘ and ‘If you want to use the dnormmix distribution but do not care about label switching, then you can disable the tempered transition sampler with
set factory “mix::TemperedMix” off, type(sampler)’
That is me, so here is how to do that in R:

library(R2jags)

Loading required package: coda

Loading required package: lattice

Loading required package: R2WinBUGS

Loading required package: rjags

linking to JAGS 3.3.0

module basemod loaded

module bugs loaded

Loading required package: abind

Loading required package: parallel

Attaching package: ‘R2jags’

The following object(s) are masked from ‘package:coda’:

traceplot

load.module(“mix”)

module mix loaded

set.factory(“mix::TemperedMix”, ‘sampler’, FALSE)

NULL

list.factories(‘sampler’)

factory status

1 mix::TemperedMix FALSE

2 bugs::DSum TRUE

3 bugs::Conjugate TRUE

4 bugs::Dirichlet TRUE

5 bugs::MNormal TRUE

6 base::Finite TRUE

7 base::Slice TRUE

football model

This is the previous model with dnormmix grafted in and overall strength with attack/defense removed in an attempt to simplify and understand the estimation problems . Note that Astr[1] is fixed to 1, Dstr[1] is almost fixed to 0 and all other AStr and DStr are free. I am not 100% sure about those parts. Astr needs to be fixed because otherwise the whole set of Astr[] and Dstr[] changes in a similar way over the MCMC samples. It is their relative numbers that count. Somehow, there is still a need to fix Dstr[1], as there is still an overall change. Yet, I have the feeling fixing Astr[1] gives information to determine Dstr[2:nclub] hence Astr[2:nclub] hence Dstr[1]. Yet, they all move similar, and Dstr[1] is clearly not 0 as suggested by the prior.

fbmodel1 <- function() {

for (i in 1:N) {

HomeMadeGoals[i] ~ dpois(HS[i])

OutMadeGoals[i] ~ dpois(OS[i])

log(HS[i]) <- Home + Dstr[OutClub[i]] + Astr[HomeClub[i]]

log(OS[i]) <- Dstr[HomeClub[i]] + Astr[OutClub[i]]

}

Astr[1] <- 1

Dstr[1] ~ dnorm(0,10)

for (i in 2:nclub) {

Astr[i] ~ dnormmix(MAStr,tauAStr1,EtaAStr1)

Dstr[i] ~ dnormmix(MDStr,tauDStr1,EtaDStr1)

}

for (i in 1:3) {

MAStr[i] ~ dnorm(0,.01)

MDStr[i] ~ dnorm(0,.01)

tauDStr1[i] <- tauDStr

tauAStr1[i] <- tauAStr

eee[i] <- 3

}

EtaAStr1[1:3] ~ ddirch(eee[1:3])

EtaDStr1[1:3] ~ ddirch(eee[1:3])

sigmaAstr <- 1/sqrt(tauAStr)

tauAStr ~ dgamma(.001,.001)

sigmaDstr <- 1/sqrt(tauDStr)

tauDStr ~ dgamma(.001,.001)

Home ~ dnorm(0,.0001)

}

params <- c("Dstr","Astr","sigmaAstr","sigmaDstr","Home")

inits <- function(){

list(TopStr=rnorm(JAGSData$nclub),

AD=rnorm(JAGSData$nclub),

sigmaAD=runif(1),

sigmaStr=runif(0,.5),

Home=rnorm(1)

)

}

jagsfit <- jags(JAGSData, model=fbmodel1, inits=inits,

parameters=params,progress.bar=”gui”,

n.iter=5000)

jagsfit

Inference for Bugs model at “C:/Users/…/RtmpSGssTD/modeld94a9b3849.txt”, fit using jags,

3 chains, each with 15000 iterations (first 7500 discarded), n.thin = 7

n.sims = 3216 iterations saved

mu.vect sd.vect 2.5% 25% 50% 75% 97.5% Rhat n.eff

Astr[1] 1.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 1

Astr[2] 1.641 0.160 1.338 1.533 1.639 1.748 1.971 1.033 69

Astr[3] 1.328 0.196 0.943 1.196 1.324 1.463 1.710 1.023 91

Astr[4] 0.878 0.185 0.513 0.753 0.880 1.007 1.228 1.021 120

Astr[5] 0.753 0.207 0.326 0.617 0.755 0.894 1.143 1.017 140

Astr[6] 0.949 0.177 0.598 0.836 0.950 1.065 1.304 1.021 110

Astr[7] 1.559 0.158 1.257 1.450 1.559 1.669 1.875 1.032 73

Astr[8] 1.167 0.193 0.804 1.034 1.161 1.293 1.566 1.022 120

Astr[9] 1.426 0.180 1.078 1.304 1.429 1.551 1.756 1.029 84

Astr[10] 1.117 0.187 0.764 0.990 1.114 1.237 1.499 1.024 99

Astr[11] 1.006 0.179 0.660 0.889 1.003 1.121 1.368 1.024 98

Astr[12] 0.955 0.181 0.603 0.830 0.957 1.073 1.312 1.024 91

Astr[13] 1.599 0.159 1.296 1.490 1.599 1.712 1.919 1.034 71

Astr[14] 0.927 0.179 0.566 0.809 0.928 1.053 1.277 1.025 90

Astr[15] 1.178 0.195 0.810 1.046 1.174 1.308 1.574 1.016 160

Astr[16] 1.539 0.164 1.216 1.429 1.538 1.656 1.854 1.034 70

Astr[17] 1.040 0.183 0.693 0.917 1.039 1.154 1.414 1.019 130

Astr[18] 0.968 0.180 0.614 0.849 0.971 1.090 1.315 1.029 91

Dstr[1] -0.640 0.162 -0.977 -0.747 -0.638 -0.530 -0.331 1.031 79

Dstr[2] -1.183 0.179 -1.533 -1.301 -1.183 -1.062 -0.844 1.034 66

Dstr[3] -1.203 0.181 -1.568 -1.323 -1.196 -1.083 -0.853 1.023 97

Dstr[4] -0.709 0.160 -1.011 -0.814 -0.710 -0.604 -0.400 1.040 58

Dstr[5] -0.697 0.162 -1.020 -0.804 -0.697 -0.591 -0.375 1.050 46

Dstr[6] -0.840 0.177 -1.208 -0.956 -0.833 -0.719 -0.508 1.031 70

Dstr[7] -1.054 0.189 -1.413 -1.191 -1.055 -0.927 -0.675 1.039 62

Dstr[8] -0.869 0.181 -1.237 -0.987 -0.862 -0.743 -0.538 1.023 91

Dstr[9] -1.177 0.177 -1.533 -1.293 -1.178 -1.059 -0.832 1.022 100

Dstr[10] -0.821 0.169 -1.174 -0.929 -0.812 -0.704 -0.503 1.022 98

Dstr[11] -0.944 0.188 -1.307 -1.074 -0.935 -0.812 -0.593 1.023 95

Dstr[12] -1.092 0.177 -1.433 -1.216 -1.092 -0.972 -0.739 1.024 87

Dstr[13] -1.030 0.188 -1.386 -1.162 -1.036 -0.900 -0.650 1.023 100

Dstr[14] -1.031 0.189 -1.384 -1.160 -1.028 -0.899 -0.665 1.035 64

Dstr[15] -0.735 0.164 -1.065 -0.843 -0.734 -0.624 -0.418 1.042 54

Dstr[16] -0.836 0.178 -1.212 -0.952 -0.830 -0.714 -0.500 1.025 88

Dstr[17] -1.116 0.178 -1.456 -1.237 -1.121 -0.998 -0.758 1.042 55

Dstr[18] -0.681 0.163 -1.005 -0.791 -0.681 -0.571 -0.367 1.041 54

Home 0.334 0.062 0.210 0.293 0.335 0.376 0.453 1.012 170

sigmaAstr 0.185 0.103 0.042 0.102 0.169 0.254 0.410 1.014 220

sigmaDstr 0.135 0.079 0.028 0.067 0.123 0.190 0.301 1.033 68

deviance 1889.263 8.560 1873.823 1883.430 1888.635 1894.484 1907.580 1.001 2400

For each parameter, n.eff is a crude measure of effective sample size,

and Rhat is the potential scale reduction factor (at convergence, Rhat=1).

DIC info (using the rule, pD = var(deviance)/2)

pD = 36.6 and DIC = 1925.9

DIC is an estimate of expected predictive error (lower deviance is better).

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Mixed distribution

mixed distribution

football model

Related

mixed distribution

football model

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)