GLM, non-linearity and heteroscedasticity

Posted on October 22, 2013 by arthur charpentier in R bloggers | 0 Comments

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last week in non-life insurance course, we’ve seen the theory of the Generalized Linear Models, emphasizing the two important components

the link function (which is actually the key component in predictive modeling)
the distribution, or the variance function

Just to illustrate, consider my favorite dataset

lin.mod = lm(dist~speed,data=cars)

A linear model means here

$Y_i=\beta_0+\beta_1 X_i +\varepsilon_i$

where the residuals are assumed to be centered, independent, and with identical variance. If we visualize that linear regression, we usually see something like that

The idea here (in GLMs) is to assume

$Y\vert X=x\sim\mathcal{N}( \beta_0+\beta_1 x,\sigma^2)$

which will produce the same model as the one describe previously, based on some error term. That model can be visualized below,

attach(cars)
n=2
X= cars$speed 
Y=cars$dist
df=data.frame(X,Y)
vX=seq(min(X)-2,max(X)+2,length=n)
vY=seq(min(Y)-15,max(Y)+15,length=n)
mat=persp(vX,vY,matrix(0,n,n),zlim=c(0,.1),theta=-30,ticktype ="detailed", box = FALSE)
reggig=glm(Y~X,data=df,family=gaussian(link="identity"))
x=seq(min(X),max(X),length=501)
C=trans3d(x,predict(reggig,newdata=data.frame(X=x),type="response"),rep(0,length(x)),mat)
lines(C,lwd=2)
sdgig=sqrt(summary(reggig)$dispersion)
x=seq(min(X),max(X),length=501)
y1=qnorm(.95,predict(reggig,newdata=data.frame(X=x),type="response"), sdgig)
C=trans3d(x,y1,rep(0,length(x)),mat)
lines(C,lty=2)
y2=qnorm(.05,predict(reggig,newdata=data.frame(X=x),type="response"), sdgig)
C=trans3d(x,y2,rep(0,length(x)),mat)
lines(C,lty=2)
C=trans3d(c(x,rev(x)),c(y1,rev(y2)),rep(0,2*length(x)),mat)
polygon(C,border=NA,col="yellow")
C=trans3d(X,Y,rep(0,length(X)),mat)
points(C,pch=19,col="red")
n=8
vX=seq(min(X),max(X),length=n)
mgig=predict(reggig,newdata=data.frame(X=vX))
sdgig=sqrt(summary(reggig)$dispersion)
for(j in n:1){
stp=251
x=rep(vX[j],stp)
y=seq(min(min(Y)-15,qnorm(.05,predict(reggig,newdata=data.frame(X=vX[j]),type="response"), sdgig)),max(Y)+15,length=stp)
z0=rep(0,stp)
z=dnorm(y, mgig[j], sdgig)
C=trans3d(c(x,x),c(y,rev(y)),c(z,z0),mat)
polygon(C,border=NA,col="light blue",density=40)
C=trans3d(x,y,z0,mat)
lines(C,lty=2)
C=trans3d(x,y,z,mat)
lines(C,col="blue")}

We do have two parts here: the linear increase of the average, $\mathbb{E}(Y\vert X=x)=\beta_0+\beta_1 x$ and the constant variance of the normal distribution $\text{Var}(Y\vert X=x)=\sigma^2$ .

On the other hand, if we assume a Poisson regression,

poisson.reg = glm(dist~speed,data=cars,family=poisson(link="log"))

we have something like

This time, two things have changed simultaneously: our model is no longer linear, it is an exponential one $\mathbb{E}(Y\vert X=x)=e^{\beta_0+\beta_1 x}$ , and the variance is also increasing with the explanatory variable $\text{Var}(Y\vert X=x)=e^{\beta_0+\beta_1 x}$ , since with a Poisson regression,

$Y\vert X=x\sim\mathcal{P}(e^{ \beta_0+\beta_1 x})$

If we adapt the previous code, we get

The problem is that we changed two things when we introduced the Poisson regression from the linear model. So let us look at what happens when we change the two components independently. First, we can change the link function, with a Gaussian model but this time a multiplicative model (with a logarithm link function)

gaussian.reg = glm(dist~speed,data=cars,family=gaussian(link="log"))

which is still, here, an homoscedasctic model, but this time non-linear. Or we can change the link function in the Poisson regression, to get a linear model, but heteroscedastic

poisson.lin = glm(dist~speed,data=cars,family=poisson(link="identity"))

So this is basically what GLMs are about….

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

GLM, non-linearity and heteroscedasticity

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)