Site icon R-bloggers

Maximum Entropy Bootstrap Rescale and Symmetrize

Guest post by Hrishikesh D. Vinod* (Professor of Economics, Fordham University, New York, E-mail: vinod@fordham.edu. June 25, 2013)

Complete paper with R code freely available at: http://ssrn.com/abstract=< wbr />2285041

R code for changing scale without changing mean or to make a probability distribution symmetric.   These are commonly encountered problems by R programmers.  We provide code for both of these tasks in the context of maximum entropy bootstrap (meboot) package in R.
Why study bootstrap? It is a vital computer intensive tool for statistical inference (not estimation). It is particularly suited for complicated nonlinear problems where traditional (asymptotic) confidence intervals tend to be too wide, and difficult. Vinod (textbook ch.9 http://www.worldscibooks.com/< wbr />economics/6895.html) explains that traditional inference for time series based on Wiener-Kolmogorov-Khintchine (WKK) theory using higher mathematics was developed in the 1930’s, long before we had powerful computers. WKK construct a population of time series called ensemble Ω, heavily relying on the stationarity assumption. It is time we bring the ensemble into the modern era. The meboot algorithm available as R package also called meboot offers computer intensive construction of Ω . See the package vignette at http://www.jstatsoft.org/v29/< wbr />i05/ Ithas proved to be useful for applications where the time series is short, non-stationary, perhaps with regime changes, gaps and jump discontinuities. While the moving block bootstrap (MBB) has been used for mildly dependent (m-dependent) series, maximum entropy bootstrap (meboot) is the only tool for highly dependent nonstationary time series.  Basic tools using R described here are of wider applicability beyond bootstrap.
The maximum entropy (ME) density is maximally noncommittal about unavailable information regarding its functional form. It is constructed from the order statistics x(t)  of time series xt. It constructs exactly T intervals, each of which contains exactly one x(t) . The bootstrap resamples will contain one observation from each such interval with probability 1/T . This is called mass-preserving constraint. The ME density also imposes a mean-preserving constraint.  For a toy example of five observation in x, the following R code (in red ) creates J=4 resamples x(t,j) in a T ×J matrix representing the ensemble. Usually T and J>999 are much larger in a realistic ensemble for inference purposes.  The aim, of course, is to note what might happen to the time series shape in a large population of time series.
require(meboot); set.seed(234); x=c(4,12,36,20,8); xtj=meboot(x,reps=4)$ensem; xtj
The overall variance of the ME density is smaller than that of the original data.  The enhancement R code equates the population variance of ME density to that of the data. The basic idea is to use a linear transformation and multiply the deviations of resampled data from population mean by a suitably found constant kappa.

findKapa=function (x, trim = 0.1) { 
   #find kappa by which to multiply sd of each ensemble
   n <-length(x)
   xx <-sort(x)
      # ordxx <-order(x) z <-rowMeans(embed(xx, 2))
   dv <-abs(diff(as.numeric(x)))
   dvtrim <-mean(dv, trim = trim)
   xmin <-xx[1] -dvtrim
   xmax <-xx[n] + dvtrim
   aux <-colSums(t(embed(xx, 3)) * c(0.25, 0.5, 0.25))

   #following ensures mean preserving constraint
   desintxb <-c(0.75 * xx[1] + 0.25 * xx[2], aux,
                0.25 * xx[n -1] + 0.75 * xx[n]) #desired means
   zz=c(xmin,z,xmax)#extended list of z values
   v=rep(NA,n) #storing within variances

   for (i in 2: (n+1)){ v[i-1]=((zz[i]-zz[i-1])^2)/12 }
   xb=mean(x)
   s1=sum((desintxb-xb)^2)
   uv=(s1+sum(v))/n #ME density variance
   desired.sd=sd(x)
   actualME.sd=sqrt(uv)
   if (actualME.sd

For the toy example κ =0.184718 holds. The transformation changes only the population variance. The sample standard deviation of y(t,j) for any particular j-thresample (column of ytj) need not equal σx = 12.64911. The last line of the code verifies that standard deviations of transformed data get multiplied by the common factor 1.184718.

What is the motivation behind this scale adjustment? Given a time series xt the unadjusted meboot constructs a large number J of similar time series x(t,j) to form an ensemble of time series to represent the population of time series using the ME density. Our scale adjustment from x(t,j) to y(t,j) makes sure that the population variance of the transformed series equals σx. This is intuitively desirable.
Since many of the sample statistics have asymptotically Normal distributions (based on central limit theorem type arguments), it may be desirable to have symmetric sampling distributions. This motivation leads to the next symmetrizing enhancement. Theil (1980) first considered this problem for a version of the ME density having exponential tails. He suggested a symmetrizing transform new order statistics y(t). The adjustment needs to get into the guts of the meboot algorithm.  R Software code for symmetrizing is a bit too long for short description here. It is provided with detailed descriptions at : http://ssrn.com/abstract=< wbr />2285041 with examples and graphs. It creates an R function called mebootSym which provides an option to set sym=TRUE when symmetrizing is desired.