[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Last week I examined rainwater composition 1992 to 2005. There is additional data, but National Institute for Public Health has changed equipment in 2005. This week I will add those data.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Data
Data is in a number of spreadsheets. The script is a variation on last week, with an extra wrapper for spreadsheets. The final parts are merging locations (Station Leiduin and Witteveen have been replaced by De Zilk and Valthermond respectively) and adding the equipment variable. The code is at the bottom of the post.
pH
It seems pH is still slightly increasing. It also seems there are like two levels of pH. For instance, in Gilze-Rijen the pH is either around 5 or around 6. But 5.5 is less probable.
Iron
For Iron there are just four locations with complete data. Hence these are displayed. It would seem the trend downward continues.
Iron model
Since I wanted to give the two machines each a separate variance, I opted to use nlme in this particular calculation. In addition, rather than following the plot, all locations are used, and the replacement stations are not merged together. The machines seem to give similar variation, which probably means variation in iron concentration is much larger than machine accuracy. The decrease estimate is still around 5% per year.
Linear mixed-effects model fit by REML
Data: Fe2
AIC BIC logLik
3135.969 3184.729 -1559.984
Random effects:
Formula: ~StartDate | Location
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 1.018836e-05 (Intr)
StartDate 6.680830e-06 -0.05
Formula: ~1 | machine %in% Location
(Intercept) Residual
StdDev: 0.09556604 0.3791318
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | machine
Parameter estimates:
Van Essen/ECN vanger Eigenbrodt NSA 181 KHT
1.000000 1.049395
Fixed effects: LAmount ~ StartDate
Value Std.Error DF t-value p-value
(Intercept) 0.4373752 0.06188798 3258 7.067208 0
StartDate -0.0000640 0.00000549 3258 -11.645303 0
Correlation:
(Intr)
StartDate -0.887
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-3.752718189 -0.655477969 -0.003628072 0.645709629 4.226135302
Number of Observations: 3280
Number of Groups:
Location machine %in% Location
17 21
#decrease estimate
> (10^fixef(lme1)[‘StartDate’])^365
StartDate
0.9476403
Code
Import data
library(XLConnect)
readsheet <- function(wb,cursheet) {
df = readWorksheet(wb, sheet = cursheet , header = TRUE)
topline <- 8
test <- which(df[6,]==’C’)+1
numin <- df[topline:nrow(df),test]
names(numin) <- make.names( gsub(‘ ‘,”,df[6,test]))
for(i in 1:ncol(numin)) numin[,i] <- as.numeric(gsub(‘,’,’.’,numin[,i]))
status = df[topline:nrow(df),test-1]
names(status) <- paste(‘C’,make.names( gsub(‘ ‘,”,df[6,test])),sep=”)
numin$StartDate <- as.Date(substr(df[topline:nrow(df),1],1,10))
numin$EndDate <- as.Date(substr(df[topline:nrow(df),2],1,10))
numin <- cbind(numin,status)
tall <- reshape(numin,
varying=list(make.names( gsub(‘ ‘,”,df[6,test])),paste(‘C’,make.names( gsub(‘ ‘,”,df[6,test])),sep=”)),
v.names=c(‘Amount’,’Status’),
timevar=’Compound’,
idvar=c(‘StartDate’,’EndDate’),
times=paste(df[6,test],'(‘,df[5,test],’)’,sep=”),
direction=’long’)
tall$Compound <- factor(gsub(‘ )’,’)’,gsub(‘ +’,’ ‘,tall$Compound)))
row.names(tall)<- NULL
tall$Location <- cursheet
tall
}
readfile <- function(fname) {
wb <- loadWorkbook(fname)
print(wb)
sheets <- getSheets(wb)[-1]
tt <- lapply(sheets,readsheet,wb=wb)
tt2 <- do.call(rbind,tt)
tt2$Location <- factor(tt2$Location)
tt2$fname <- fname
tt2
}
fnames <- dir(pattern=’*.xls’)
rf <- lapply(fnames,readfile)
rf2 <- do.call(rbind,rf)
(ll <- levels(rf2$Location))
ll[c language=”(13,17)”][/c] <- ‘V&W’
ll[c language=”(9,4)”][/c] <- ‘L&DZ’
rf2$Location2 <- factor(ll[rf2$Location])
head(rf2)
rf2$machine <- factor(ifelse(rf2$fname==”lmre_1992-2005.xls”,
‘Van Essen/ECN vanger’,
‘ Eigenbrodt NSA 181 KHT’))
pH plot
library(ggplot2)
library(scales)
levels(rf2$Compound)
(cc <- levels(rf2$Compound)[22])
rf2$Amount2 <- rf2$Amount
rf2$Amount2[rf2$Amount<1 & rf2$Compound==cc] <- NA
rf2[rf2$Amount < 1 & rf2$Compound==cc & !is.na(rf2$Amount),]
pp <- ggplot(rf2[rf2$Compound==cc & !(rf2$Location2 %in% ll[c language=”(2,5,7,15)”][/c]),],
aes( StartDate,Amount2,col=machine))
pp + geom_point() +
facet_wrap(~Location2) +
ylab(cc) + xlab(‘Year’) + ylim(c(4,7)) +
scale_x_date(breaks=as.Date(paste(c(‘1992′,’2000′,’2010′),’-01-01′,sep=”),
format=(‘%Y-%m-%d’)),
labels = date_format(“%Y”)) +
theme(legend.position = “bottom”)
Fe Plot
(cc <- levels(rf2$Compound)[9])
#levels(rf2$Location2)
pp <- ggplot(rf2[rf2$Compound==cc & (rf2$Location2 %in%
levels(rf2$Location2)[c language=”(7,8,10,13)”][/c]),],
aes( StartDate,Amount,color=machine))
pp + geom_point() +
scale_y_log10() +
facet_wrap(~Location2) +
ylab(cc) + xlab(‘Year’) +
scale_x_date(breaks=as.Date(paste(c(‘1992′,’2000′,’2010′),’-01-01′,sep=”),
format=(‘%Y-%m-%d’)),
labels = date_format(“%Y”)) +
theme(legend.position = “bottom”)
Fe model
Fe <- rf2[rf2$Compound==levels(rf2$Compound)[9],]
# plot(density(Fe$Amount[!is.na(Fe$Status) & Fe$Status!=0],adjust=.5))
Fe$LAmount <- log10(Fe$Amount)
Fe$LAmount[Fe$Amount<=0] <- NA
Fe$Location <- factor
#nlme
library(nlme)
Fe2 <- Fe[complete.cases(Fe),]
lme1 <- lme(LAmount ~ StartDate,
# random= ~ 1|machine, # machine = groups
# random=~StartDate | Location, # location=groups
random=list(Location=~StartDate, machine=~1),
weights=varIdent(form=~1 | machine),
data=Fe2 )
summary(lme1)
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.