Maximum Entropy Bootstrap Rescale and Symmetrize
				
            Guest post by Hrishikesh D. Vinod*
(∗Professor of Economics, Fordham University, New York, E-mail: [email protected]. June 25, 2013)
Complete paper with R code freely available at: http://ssrn.com/abstract=
R code for changing scale without changing mean or to make a probability distribution symmetric.   These are commonly encountered problems by R programmers.  We provide code for both of these tasks in the context of maximum entropy bootstrap (meboot) package in R.
	Complete paper with R code freely available at: http://ssrn.com/abstract=
R code for changing scale without changing mean or to make a probability distribution symmetric.   These are commonly encountered problems by R programmers.  We provide code for both of these tasks in the context of maximum entropy bootstrap (meboot) package in R.
Why study bootstrap? It is a vital computer intensive tool for statistical inference (not estimation). It is particularly suited for complicated nonlinear problems where traditional (asymptotic) confidence intervals tend to be too wide, and difficult. Vinod (textbook ch.9 http://www.worldscibooks.com/
The maximum entropy (ME) density is maximally noncommittal about unavailable information regarding its functional form. It is constructed from the order statistics x(t)  of time series xt. It constructs exactly T intervals, each of which contains exactly one x(t) . The bootstrap resamples will contain one observation from each such interval with probability 1/T . This is called mass-preserving constraint. The ME density also imposes a mean-preserving constraint.  For a toy example of five observation in x, the following R code (in red font) creates J=4 resamples x(t,j) in a T ×J matrix representing the ensemble. Usually T and J>999 are much larger in a realistic ensemble for inference purposes.  The aim, of course, is to note what might happen to the time series shape in a large population of time series.
require(meboot); set.seed(234); x=c(4,12,36,20,8); xtj=meboot(x,reps=4)$ensem; xtj
The overall variance of the ME density is smaller than that of the original data.  The enhancement R code equates the population variance of ME density to that of the data. The basic idea is to use a linear transformation and multiply the deviations of resampled data from population mean by a suitably found constant kappa.
findKapa=function (x, trim = 0.1) { 
   #find kappa by which to multiply sd of each ensemble
   n <-length(x)
   xx <-sort(x)
      # ordxx <-order(x) z <-rowMeans(embed(xx, 2))
   dv <-abs(diff(as.numeric(x)))
   dvtrim <-mean(dv, trim = trim)
   xmin <-xx[1] -dvtrim
   xmax <-xx[n] + dvtrim
   aux <-colSums(t(embed(xx, 3)) * c(0.25, 0.5, 0.25))
   #following ensures mean preserving constraint
   desintxb <-c(0.75 * xx[1] + 0.25 * xx[2], aux,
                0.25 * xx[n -1] + 0.75 * xx[n]) #desired means
   zz=c(xmin,z,xmax)#extended list of z values
   v=rep(NA,n) #storing within variances
   for (i in 2: (n+1)){ v[i-1]=((zz[i]-zz[i-1])^2)/12 }
   xb=mean(x)
   s1=sum((desintxb-xb)^2)
   uv=(s1+sum(v))/n #ME density variance
   desired.sd=sd(x)
   actualME.sd=sqrt(uv)
   if (actualME.sd
For the toy example κ =0.184718 holds. The transformation changes only the population variance. The sample standard deviation of y(t,j) for any particular j-thresample (column of ytj) need not equal σx = 12.64911. The last line of the code verifies that standard deviations of transformed data get multiplied by the common factor 1.184718.
What is the motivation behind this scale adjustment? Given a time series xt the unadjusted meboot constructs a large number J of similar time series x(t,j) to form an ensemble of time series to represent the population of time series using the ME density. Our scale adjustment from x(t,j) to y(t,j) makes sure that the population variance of the transformed series equals σx2 . This is intuitively desirable.
Since many of the sample statistics have asymptotically Normal distributions (based on central limit theorem type arguments), it may be desirable to have symmetric sampling distributions. This motivation leads to the next  symmetrizing enhancement. Theil (1980) first considered this problem for a version of the ME density having exponential tails. He suggested a symmetrizing transform new order statistics y(t).  The adjustment needs to get into the guts of the meboot algorithm.  R Software code for symmetrizing is a bit too long for short description here. It is provided with detailed descriptions at : http://ssrn.com/abstract=
		
            			