[This article was first published on R snippets, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Last week I have posted about using simulation metamodeling to verify results of analytical solution of the model. After posting it I realized that the solution presented there can be improved by using knowledge of simulation model structure.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The story is about calculation of the variance of the random variable given by formula:
q(p-2+q/N)
where p and N are model parameters and q is a random variable that has binomial distribution with N Bernoulli trials and probability of success 2-p. As I have written the properly specified formula for the metamodel has the following form:
v ~ (p + I(p ^ 2) + I(p ^ 3) + I(p ^ 4)) * (N + I(1 /N))
where v is an estimate of the variance. Such a model was discussed in my last post (it is described in more detail there so I omit the repetition in this post).
However one can easily notice that for p=1 and p=2 the variance of the random variable is equal to 0. This is because in such case q has no variance (it is equal to 1 and 0 respectively).
In estimation of the metamodel one can take this into account in two ways:
- the weight of cases where p equals 1 or 2 might be increased (so we use Weighted Least Squares)
- a restriction on values of prediction in points where p equals 1 or 2 can be made to ensure that it is exactly 0 (so we use Constrained Least Squares)
The code given below estimates: original model and two versions of corrected models:
library(plyr)< o:p>
library(limSolve)< o:p>
sim <- function(param) {< o:p>
q <- rbinom(1000, param$N, 2 – param$p)< o:p>
c(v = var((param$p – 2 + q / param$N) * q))< o:p>
}< o:p>
set.seed(1)< o:p>
data.set <- expand.grid(p = seq(1, 2, len = 101), N = 1:10)< o:p>
data.set <- ddply(data.set, .(p,N), sim,.progress = “text”)< o:p>
lm.coef <- coef(lm(v ~ (p + I(p ^ 2) + I(p ^ 3) + I(p ^ 4))< o:p>
* (N + I(1 / N)), data = data.set))< o:p>
weights <- ifelse(data.set$v==0, 100000, 1)< o:p>
wlm.coef <- coef(lm(v ~ (p + I(p ^ 2) + I(p ^ 3) + I(p ^ 4))< o:p>
* (N + I(1 / N)), data = data.set,< o:p>
weights = weights))< o:p>
A <- with(data.set, cbind(1, p, p^2, p^3, p^4, N, 1 / N,< o:p>
p * N, p / N, p ^ 2 * N, p ^ 2 / N,< o:p>
p ^ 3 * N, p ^ 3 / N, p ^ 4 * N, p ^ 4 / N))< o:p>
colnames(A) <- names(lm.coef)< o:p>
B <- data.set$v< o:p>
E <- A[B == 0,]< o:p>
F <- B[B == 0]< o:p>
lsei.coef <- lsei(A, B, E, F)$X< o:p>
Now we can compare the coefficients from these models to their analytical values given in the last post:
true.coef <- c(32, –88, 88, –38, 6, –8, –26, 20,< o:p>
75, –18, –79, 7, 36, –1, –6)< o:p>
coef.dev <- cbind(lm.coef, wlm.coef, lsei.coef) – true.coef< o:p>
print(coef.dev, digits = 2)< o:p>
dotchart(coef.dev[,1], pch = 19)< o:p>
points(coef.dev[,2], 1:nrow(coef.dev), col = “red”, pch = 19)< o:p>
abline(v = 0, col = “gray”, lty = 2)< o:p>
The code first tabulates the deviation of estimates from true values:
lm.coef wlm.coef lsei.coef< o:p>
(Intercept) -0.6056 0.060 0.060< o:p>
p 1.5061 -0.235 -0.235< o:p>
I(p^2) -1.3576 0.323 0.323< o:p>
I(p^3) 0.5243 -0.185 -0.185< o:p>
I(p^4) -0.0731 0.038 0.038< o:p>
N 0.0088 -0.060 -0.060< o:p>
I(1/N) 1.1510 0.354 0.354< o:p>
p:N -0.0065 0.173 0.173< o:p>
p:I(1/N) -3.0646 -0.965 -0.965< o:p>
I(p^2):N -0.0102 -0.182 -0.182< o:p>
I(p^2):I(1/N) 2.9942 0.953 0.953< o:p>
I(p^3):N 0.0115 0.084 0.084< o:p>
I(p^3):I(1/N) -1.2730 -0.404 -0.404< o:p>
I(p^4):N -0.0029 -0.014 -0.014< o:p>
I(p^4):I(1/N) 0.1991 0.062 0.062< o:p>
and next plots the visualization of the differences (on the plot black points represent OLS estimates and red – WLS):
It can be seen that:
- the precision of estimation is improved by using additional information on problem structure;
- the estimates from WLS and CLS are identical to the third decimal place.
To leave a comment for the author, please follow the link and comment on their blog: R snippets.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.