Simple Distributions for Mixtures?

Posted on February 3, 2016 by arthur charpentier in R bloggers | 0 Comments

[This article was first published on R-english – Freakonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The idea of GLMs is that given some covariates $X$ , $Y|X$ has a distribution in the exponential family (Gaussian, Poisson, Gamma, etc). But that does not mean that $Y$ has a similar distribution… so there is no reason to test for a Gamma model for $Y$ before running a Gamma regression, for instance. But are there cases where it might work? That the non-conditional distribution is the same (same family at least) than the conditional ones?

For instance, if $(X,Y)$ has a joint Gaussien distribution, then both marginals are Gaussian, but also $Y|X$ . So, in that case, if the covariate is normally distributed, it is possible to have a Gaussian distribution also for $Y$ . The econometric interpretation is that with a standard Gaussian linear model, if $X$ is normally distributed, not only the conditional distribution $Y|X$ is Gaussian but also the non-conditional distribution of $Y$ .

> set.seed(1)
> n=1e3
> X=rnorm(n,10,2)
> Y=1+3*X+rnorm(n)
> plot(X,Y,xlim=c(4,20))

Indeed, here the distribution of $Y$ is also Gaussian

> library(nortest)
> ad.test(Y)

	Anderson-Darling normality test

data:  Y
A = 0.23155, p-value = 0.802

> shapiro.test(Y)

	Shapiro-Wilk normality test

data:  Y
W = 0.99892, p-value = 0.8293

(not only from a statistical point of view, the thoery of Gaussian random vectors confirms that the non-conditional distribution is Gaussian actually)

Here $X$ is continuous. What if we consider a finite mixture here, i.e. $X$ takes only a finite number of values? Actually, Teicher (1963) proved that it is not possible to have a non-conditional Gaussian distribution for $Y$ . But in practice, would we really reject the Gaussian assumption, for $Y$ ? If the number of classes is to small, yes. But with a large number of classes (a sufficiently large number of mixture components), it is possible,

> pv=function(k=2){
+ n=1e4
+ X=rnorm(n,10,2)
+ Q=quantile(X,(0:k)/k)
+ Q[1]=0
+ Xc=cut(X,Q,labels=1:k)
+ XcN=tapply(X,Xc,mean)
+ Xn=XcN[as.numeric(Xc)]
+ Y=1+3*Xn+rnorm(n)
+ ad.test(Y)$p.value}
 
> plot(2:100,Vectorize(pv)(2:100),type="l")
> abline(h=.05,col="red")

So here, it could be possible to have also a Gaussian distribution, for $Y$ . As least to accept that assumption, statistically.

In the context of a Poisson regression, it is well know that it’s not possible to have at the same time $Y|X$ that is Poisson distributed (that’s a Poisson regression) and also $Y$ that is Poisson distributed. That simply comes from the fact that

$\mathbb{E}[Y]=\mathbb{E}[\mathbb{E}[Y\vert X]]$

while

$\text{Var}[Y]=\mathbb{E}[\text{Var}[Y\vert X]]+\text{Var}[\mathbb{E}[Y\vert X]]$

and because of the conditional Poisson distribution, then

$\text{Var}[Y\vert X]=\mathbb{E}[Y\vert X]$

Thus,

$\text{Var}[Y]=\mathbb{E}[Y]+\underbrace{\text{Var}[\mathbb{E}[Y\vert X]]}_{>0}$

So $Y$ cannot be Poisson distribution. But again, it could be possible, if heterogeneity is not too large, to accept the null assumption of a Poisson distribution for $Y$ .

More generally, it is very difficult to have a distribution family for $Y|X$ that is also the distribution of the non-conditional variable $Y$ . In the context of a finite mixture ( $X$ takes a finite number of values),Teicher (1963) proved that it was not not possible, neither for the Gaussian distribution nor the Gamma distribution. An to go further, check Monfrini (2002) (thanks Romuald for point out the reference).

Hence, as a keep saying, before running a regression model on $Y|X$ with some given family, it is never a good idea to check if the non-conditional distribution $Y$ has the same distribution. Because there is no reason, usually, to remain in the same family.

To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Simple Distributions for Mixtures?

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)