predictNLS (Part 1, Monte Carlo simulation): confidence intervals for ‘nls’ models

anspiess

9 years ago

[This article was first published on Rmazing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Those that do a lot of nonlinear fitting with the nls function may have noticed that predict.nls does not have a way to calculate a confidence interval for the fitted value. Using confint you can obtain the error of the fit parameters, but how about the error in fitted values? ?predict.nls says: “At present se.fit and interval are ignored.” What a pity… This is largely to the fact that confidence intervals for nonlinear fits are not easily calculated and under some debate, see http://r.789695.n4.nabble.com/plotting-confidence-bands-from-predict-nls-td3505012.html or http://thr3ads.net/r-help/2011/05/1053390-plotting-confidence-bands-from-predict.nls. In principle, since calculating the error in the fitted values is a matter of “error propagation”, two different approaches can be used:

1) Error propagation using approximation by (first-order) Taylor expansion around ,

2) Error propagation using Monte Carlo simulation.

Topic 1) will be subject of my next post, today we will stick with the MC approach.

When calculating the error in the fitted values, we need to propagate the error of all variables, i.e. the error in all predictor variables and the error of the fit parameters , to the response . Often (as in the ‘Examples’ section of nls), there is no error in the -values. The errors of the fit parameters are obtained, together with their correlations, in the variance-covariance matrix from vcov(object).

A Monte Carlo approach to nonlinear error propagation does the following:

1) Use as input and of all predictor variables and the vcov matrix of the fit parameters .

2) For each variable , we create samples from a multivariate normal distribution using the variance-covariance matrix: .

3) We evaluate the function on each simulated variable:

4) We calculate statistics (mean, s.d., median, mad) and quantile-based confidence intervals on the vector .

This is exactly what the following function does: It takes an nls object, extracts the variables/parameter values/parameter variance-covariance matrix, creates an “augmented” covariance matrix (with the variance/covariance values from the parameters and predictor variables included, the latter often being zero), simulates from a multivariate normal distribution (using mvrnorm of the ‘MASS’ package), evaluates the function (object$call$formula) on the values and finally collects statistics. Here we go:

predictNLS <- function(
object, 
newdata,
level = 0.95, 
nsim = 10000,
...
)
{
  require(MASS, quietly = TRUE)
  
  ## get right-hand side of formula
  RHS <- as.list(object$call$formula)[[3]]
  EXPR <- as.expression(RHS)
  
  ## all variables in model
  VARS <- all.vars(EXPR)
  
  ## coefficients
  COEF <- coef(object)
  
  ## extract predictor variable    
  predNAME <- setdiff(VARS, names(COEF))  
  
  ## take fitted values, if 'newdata' is missing
  if (missing(newdata)) {
    newdata <- eval(object$data)[predNAME]
    colnames(newdata) <- predNAME
  }
    
  ## check that 'newdata' has same name as predVAR
  if (names(newdata)[1] != predNAME) stop("newdata should have name '", predNAME, "'!")
  
  ## get parameter coefficients
  COEF <- coef(object)
    
  ## get variance-covariance matrix
  VCOV <- vcov(object)
  
  ## augment variance-covariance matrix for 'mvrnorm' 
  ## by adding a column/row for 'error in x'
  NCOL <- ncol(VCOV)
  ADD1 <- c(rep(0, NCOL))
  ADD1 <- matrix(ADD1, ncol = 1)
  colnames(ADD1) <- predNAME
  VCOV <- cbind(VCOV, ADD1)
  ADD2 <- c(rep(0, NCOL + 1))
  ADD2 <- matrix(ADD2, nrow = 1)
  rownames(ADD2) <- predNAME
  VCOV <- rbind(VCOV, ADD2) 
        
  ## iterate over all entries in 'newdata' as in usual 'predict.' functions
  NR <- nrow(newdata)
  respVEC <- numeric(NR)
  seVEC <- numeric(NR)
  varPLACE <- ncol(VCOV)   
  
  ## define counter function
  counter <- function (i) 
  {
    if (i%%10 == 0) 
      cat(i)
    else cat(".")
    if (i%%50 == 0) 
      cat("\n")
    flush.console()
  }
  
  outMAT <- NULL  
  
  for (i in 1:NR) {
    counter(i)
    
    ## get predictor values and optional errors
    predVAL <- newdata[i, 1]
    if (ncol(newdata) == 2) predERROR <- newdata[i, 2] else predERROR <- 0
    names(predVAL) <- predNAME  
    names(predERROR) <- predNAME  
    
    ## create mean vector for 'mvrnorm'
    MU <- c(COEF, predVAL)
    
    ## create variance-covariance matrix for 'mvrnorm'
    ## by putting error^2 in lower-right position of VCOV
    newVCOV <- VCOV
    newVCOV[varPLACE, varPLACE] <- predERROR^2
    
    ## create MC simulation matrix
    simMAT <- mvrnorm(n = nsim, mu = MU, Sigma = newVCOV, empirical = TRUE)
    
    ## evaluate expression on rows of simMAT
    EVAL <- try(eval(EXPR, envir = as.data.frame(simMAT)), silent = TRUE)
    if (inherits(EVAL, "try-error")) stop("There was an error evaluating the simulations!")
    
    ## collect statistics
    PRED <- data.frame(predVAL)
    colnames(PRED) <- predNAME   
    FITTED <- predict(object, newdata = data.frame(PRED))
    MEAN.sim <- mean(EVAL, na.rm = TRUE)
    SD.sim <- sd(EVAL, na.rm = TRUE)
    MEDIAN.sim <- median(EVAL, na.rm = TRUE)
    MAD.sim <- mad(EVAL, na.rm = TRUE)
    QUANT <- quantile(EVAL, c((1 - level)/2, level + (1 - level)/2))
    RES <- c(FITTED, MEAN.sim, SD.sim, MEDIAN.sim, MAD.sim, QUANT[1], QUANT[2])
    outMAT <- rbind(outMAT, RES)
  }
  
  colnames(outMAT) <- c("fit", "mean", "sd", "median", "mad", names(QUANT[1]), names(QUANT[2]))
  rownames(outMAT) <- NULL
  
  cat("\n")
  
  return(outMAT)  
}

The input is an ‘nls’ object, a data.frame ‘newdata’ of values to be predicted with
the value in the first column and (optional) “errors-in-x” (as ) in the second column.
The number of simulations can be tweaked with nsim as well as the alpha-level for the
confidence interval.
The output is (fitted value), (mean of simulation), (s.d. of simulation), (median of simulation), (mad of simulation) and the lower/upper confidence interval.

Ok, let’s go to it (taken from the ‘?nls’ documentation):

DNase1 <- subset(DNase, Run == 1) fm1DNase1 <- nls(density ~ SSlogis(log(conc), Asym, xmid, scal), DNase1) ## usual predict.nls has no confidence intervals implemented predict(fm1DNase1, newdata = data.frame(conc = 5), interval = "confidence") [1] 1.243631 attr(,"gradient") Asym xmid scal [1,] 0.5302925 -0.5608912 -0.06804642
In the next post we will see how to use the gradient attribute to calculate a first-order Taylor expansion around …

However, predictNLS gives us the error and confidence interval at :

predictNLS(fm1DNase1, newdata = data.frame(conc = 5)) . fit mean sd median mad 2.5% 97.5% [1,] 1.243631 1.243293 0.009462893 1.243378 0.009637439 1.224608 1.261575
Interesting to see, how close the mean of the simulation comes to the actual fitted value…
We could also add some error in to propagate to :
> predictNLS(fm1DNase1, newdata = data.frame(conc = 5, error = 0.1)) . fit mean sd median mad 2.5% 97.5% [1,] 1.243631 1.243174 0.01467673 1.243162 0.01488567 1.214252 1.272103

Have fun. If anyone know how to calculate a “prediction interval” (maybe quantile regression) give me hint…

Cheers,
Andrej

Filed under: General, R Internals Tagged: confidence interval, fitting, Monte Carlo, nls, nonlinear

To leave a comment for the author, please follow the link and comment on their blog: Rmazing.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.