Simulation metamodeling with constraints
[This article was first published on R snippets, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Last week I have posted about using simulation metamodeling to verify results of analytical solution of the model. After posting it I realized that the solution presented there can be improved by using knowledge of simulation model structure.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The story is about calculation of the variance of the random variable given by formula:
q(p-2+q/N)
where p and N are model parameters and q is a random variable that has binomial distribution with N Bernoulli trials and probability of success 2-p. As I have written the properly specified formula for the metamodel has the following form:
v ~ (p + I(p ^ 2) + I(p ^ 3) + I(p ^ 4)) * (N + I(1 /N))
where v is an estimate of the variance. Such a model was discussed in my last post (it is described in more detail there so I omit the repetition in this post).
However one can easily notice that for p=1 and p=2 the variance of the random variable is equal to 0. This is because in such case q has no variance (it is equal to 1 and 0 respectively).
In estimation of the metamodel one can take this into account in two ways:
- the weight of cases where p equals 1 or 2 might be increased (so we use Weighted Least Squares)
- a restriction on values of prediction in points where p equals 1 or 2 can be made to ensure that it is exactly 0 (so we use Constrained Least Squares)
The code given below estimates: original model and two versions of corrected models:
library(plyr)
library(limSolve)
sim <- function(param) {
q <- rbinom(1000, param$N, 2 – param$p)
c(v = var((param$p – 2 + q / param$N) * q))
}
set.seed(1)
data.set <- expand.grid(p = seq(1, 2, len = 101), N = 1:10)
data.set <- ddply(data.set, .(p,N), sim,.progress = “text”)
lm.coef <- coef(lm(v ~ (p + I(p ^ 2) + I(p ^ 3) + I(p ^ 4))
* (N + I(1 / N)), data = data.set))
weights <- ifelse(data.set$v==0, 100000, 1)
wlm.coef <- coef(lm(v ~ (p + I(p ^ 2) + I(p ^ 3) + I(p ^ 4))
* (N + I(1 / N)), data = data.set,
weights = weights))
A <- with(data.set, cbind(1, p, p^2, p^3, p^4, N, 1 / N,
p * N, p / N, p ^ 2 * N, p ^ 2 / N,
p ^ 3 * N, p ^ 3 / N, p ^ 4 * N, p ^ 4 / N))
colnames(A) <- names(lm.coef)
B <- data.set$v
E <- A[B == 0,]
F <- B[B == 0]
lsei.coef <- lsei(A, B, E, F)$X
Now we can compare the coefficients from these models to their analytical values given in the last post:
true.coef <- c(32, –88, 88, –38, 6, –8, –26, 20,
75, –18, –79, 7, 36, –1, –6)
coef.dev <- cbind(lm.coef, wlm.coef, lsei.coef) – true.coef
print(coef.dev, digits = 2)
dotchart(coef.dev[,1], pch = 19)
points(coef.dev[,2], 1:nrow(coef.dev), col = “red”, pch = 19)
abline(v = 0, col = “gray”, lty = 2)
The code first tabulates the deviation of estimates from true values:
lm.coef wlm.coef lsei.coef
(Intercept) -0.6056 0.060 0.060
p 1.5061 -0.235 -0.235
I(p^2) -1.3576 0.323 0.323
I(p^3) 0.5243 -0.185 -0.185
I(p^4) -0.0731 0.038 0.038
N 0.0088 -0.060 -0.060
I(1/N) 1.1510 0.354 0.354
p:N -0.0065 0.173 0.173
p:I(1/N) -3.0646 -0.965 -0.965
I(p^2):N -0.0102 -0.182 -0.182
I(p^2):I(1/N) 2.9942 0.953 0.953
I(p^3):N 0.0115 0.084 0.084
I(p^3):I(1/N) -1.2730 -0.404 -0.404
I(p^4):N -0.0029 -0.014 -0.014
I(p^4):I(1/N) 0.1991 0.062 0.062
and next plots the visualization of the differences (on the plot black points represent OLS estimates and red – WLS):
It can be seen that:
- the precision of estimation is improved by using additional information on problem structure;
- the estimates from WLS and CLS are identical to the third decimal place.
To leave a comment for the author, please follow the link and comment on their blog: R snippets.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.