More rainfall calculations – REML
[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I wanted to have a look at various REML methods for a long time. The rainfall data seemed a nice example. On top of that, FreshBiostats had a blog post ‘Mixed Models in R: lme4, nlme, or both?’. So lme4 it is.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The data
As described before; the data are from KNMI; ROYAL NETHERLANDS METEOROLOGICAL INSTITUTE. The actual page for the data is http://www.knmi.nl/klimatologie/monv/reeksen/select_rr.html. The data selected were from 6 locations; De Bilt, Groningen, Heerde, Kerkwerve, Ter Apel and Winterswijk (Sibinkweg). The blog starts with data entered in a data.frame named all which was derived a few weeks ago.
Analysis
A simple analysis would ignore years. The data are plotted below, the figure indicates a clear effect.
sel1 <- all[(all$year <1916 | all$year>2003) & all$Month==’Jan’ ,]
sel1$decade <- factor(c('1906-1915','2004-2013')[1+(sel1$year>1950)])
sel1$Rain <- as.numeric(sel1$RD>0)
ag1 <- aggregate(sel1$Rain,list(Year=sel1$Year,
location=sel1$location,
decade=sel1$decade),FUN=sum)
ag1$Year <- factor(ag1$Year)
p <- ggplot(ag1, aes(decade, x/31))
p + geom_boxplot(notch=TRUE)
Analysis part 2
The problem appears once years are plotted; It seems more of the years in the first decade have a low proportion days with rain, but some of the last decade also have less days with rain.p <- ggplot(ag1, aes(decade, x/31,col=Year))
p + geom_boxplot()
As written lme4 was my package of choice. This does give a significant effect, but it only at 3%.
library(lme4)
l1 <- glmer(cbind(31-x,x) ~ (1|Year) ,data=ag1,family=binomial)
l1
Generalized linear mixed model fit by the Laplace approximation
Formula: cbind(31 – x, x) ~ (1 | Year)
Data: ag1
AIC BIC logLik deviance
231.3 236.9 -113.7 227.3
Random effects:
Groups Name Variance Std.Dev.
Year (Intercept) 0.38069 0.617
Number of obs: 120, groups: Year, 20
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.3950 0.1423 -2.775 0.00552 **
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
l2
Generalized linear mixed model fit by the Laplace approximation
Formula: cbind(31 – x, x) ~ decade + (1 | Year)
Data: ag1
AIC BIC logLik deviance
228.5 236.9 -111.3 222.5
Random effects:
Groups Name Variance Std.Dev.
Year (Intercept) 0.29496 0.5431
Number of obs: 120, groups: Year, 20
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.1019 0.1783 -0.572 0.5674
decade2004-2013 -0.5862 0.2528 -2.319 0.0204 *
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
Correlation of Fixed Effects:
(Intr)
dc2004-2013 -0.705
anova(l1,l2)
Data: ag1
Models:
l1: cbind(31 – x, x) ~ (1 | Year)
l2: cbind(31 – x, x) ~ decade + (1 | Year)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
l1 2 231.31 236.88 -113.65
l2 3 228.53 236.90 -111.27 4.772 1 0.02893 *
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.