Monte Carlo-based prediction intervals for nonlinear regression
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Calculation of the propagated uncertainty using (1), where is the gradient and the covariance matrix of the coefficients , is called the “Delta Method” and is widely applied in nonlinear least-squares (NLS) fitting. However, this method is based on first-order Taylor expansion and thus assummes linearity around . The second-order approach can partially correct for this restriction by using a second-order polynomial around , which is (2), where is the matrix trace and is the Hessian.
Confidence and prediction intervals for NLS models are calculated using (3) or (4), respectively, where the residual variance (5).
Now, how can we employ the matrix notation of error propagation for creating Taylor expansion- and Monte Carlo-based prediction intervals?
The inclusion of in the prediction interval can be implemented as an extended gradient and “augmented” covariance matrix. So instead of using (6), we take (7) as the expression and augment the covariance matrix to an covariance matrix, where . Partial differentiation and matrix multiplication will then yield, for example with two coefficients and and their corresponding covariance matrix :
(8)
(9)
, which then goes into (4).
The advantage of the augmented covariance matrix is that it can be exploited for creating Monte Carlo-based prediction intervals. This is new from propagate version 1.0-6 and is based on the paradigm that we add another dimension by employing the augmented covariance matrix of (8) in the multivariate t-distribution random number generator (in our case rtmvt
), with . All simulations are then evaluated with (7) and the usual quantiles calculated for the prediction interval. Using the original covariance matrix with (6) will deliver the MC-based confidence interval.
Application of second-order Taylor expansion or the MC-based approach demonstrates nicely that for the majority of nonlinear models, the confidence/prediction intervals around are quite asymmetric, which the classical Delta Method does not capture:
library(propagate)
DNase1 <- subset(DNase, Run == 1)
fm3DNase1 <- nls(density ~ Asym/(1 + exp((xmid - log(conc))/scal)),
data = DNase1, start = list(Asym = 3, xmid = 0, scal = 1))
## first-order prediction interval
set.seed(123)
PROP1 <- predictNLS(fm3DNase1, newdata = data.frame(conc = 2), nsim = 1000000,
second.order = FALSE, interval = "prediction")
t(PROP1$summary)
## second-order prediction interval and MC
set.seed(123)
PROP2 <- predictNLS(fm3DNase1, newdata = data.frame(conc = 2), nsim = 1000000,
second.order = TRUE, interval = "prediction")
t(PROP2$summary)
What we see here is that
i) the first-order prediction interval [0.70308712; 0.79300731] is symmetric and slightly down-biased compared to the second-order one [0.70317629; 0.79309874],
and
ii) the second-order prediction interval tallies nicely up to the 4th decimal with the new MC-based interval (0.70318286 and 0.70317629; 0.79311987 and 0.79309874).
I believe this clearly demonstrates the usefulness of the MC-based approach for NLS prediction interval estimation…
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.