Site icon R-bloggers

Minimum Expected Shortfall Portfolio, Part 1

[This article was first published on Adventures in Statistical Computing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A few days ago, I wrote a piece on finding the minimum expected shortfall portfolio.  A few astute commenters quickly picked up where I was going with this — using this as an alternative to low/minimum volatility portfolios.  What follows are the results of a small study I put together.  This post will deal with the portfolio selection and calculation of portfolio weights.  The next post will deal with computing portfolio statistics and comparing that portfolio to alternatives.

Note: In my last post, I mentioned the lack of speed in R during the NL optimization.  I was unable to rectify that situation.  I got tired of waiting on R while working on the code so I switched to SAS IML for the NL optimization.  I use the SAS provided methods to export the needed data into R objects.  I will use R and the handy PerformanceAnalytics package in the next post.  If anyone has ways to speed this up in R, please post it below.

Portfolio Selection
I wanted a set of stocks that were representative of the larger market, but not too many to bog down the analysis.  I choose the stocks of the Dow.

I don’t have access to the historic components of the Dow, so I have to stick to the current composition.  I want a fairly long history, but I don’t want to add too much survivors bias.  I.E.  If I go back to the late 80s, Cisco is basically a penny stock and will skew the results positive.  However, I wanted long enough to capture large market swings.  I decided to run the portfolio from 2000 through the end of 2011.  

One problem is that Google is part of the Dow and didn’t exist before 2004.  So I dropped Google, leaving me with 29 stocks.

Methodology
We will use weekly return data.  This smooths some of the noise from the series and hopefully some of the time dependence.  Using 2 years of weekly returns (approximately 104 data points) we subtract the mean.  From that fit a Student-T copula to the empirical distributions (previous example here).  Should the MLE fail to converge, fail over to a Gaussian copula.

We will rebalance the portfolio quarterly.  We assume there is no time dependence in the return series.  Using the fitted copula and empirical distributions, simulate 13 weeks ahead for 5000 iterations (65,000 total draws in the simulation).  The 13 weekly returns will be aggregated into 1 quarterly return.  From the 5000 simulated quarterly returns, we will run the minimum expected shortfall optimization.

Because the mean was taken out of the empirical distribution, our simulation assumes a 0 expected return over the period.

Code
The following SAS code makes use of the macros in the copula example with a few modifications.  You can download the new version of the macros here.

The SAS code is can be downloaded here.  It is fairly well commented, so I will not now subject you to a walk through.  If there are questions, feel free to post them.

odshtml close;< o:p>

%letstocks =mmm aa axp t bac ba cat cvx csco ko dd xom ge hpq hd intc ibm jnj jpm mcd mrk msft pfe pg trv utx vz wmt dis;< o:p>

libnamedow30 “C:\temp\9.3\dow30”;< o:p>

procdatasets lib=work nolist kill;< o:p>
quit;< o:p>

/*Download stocks and prices*/< o:p>
%get_stocks(&stocks,01JAN1998,30JUL2012,keepPrice=1);< o:p>

/*output prices and returns to permanent location*/< o:p>
datadow30.returns;< o:p>
setreturns;< o:p>
run;< o:p>

/*Put Week, Quarter, and Year on returns for aggregation*/< o:p>
datareturns;< o:p>
formatweek qtr year 4.;< o:p>
setreturns;< o:p>
week = week(date);< o:p>
year = year(date);< o:p>
qtr = qtr(date);< o:p>
run;< o:p>

datadow30.prices;< o:p>
formatweek qtr year 4.;< o:p>
setprices;< o:p>
week = week(date);< o:p>
year = year(date);< o:p>
qtr = qtr(date);< o:p>
run;< o:p>

datadow30.prices_wk;< o:p>
setdow30.prices;< o:p>
byyear week;< o:p>
iflast.week;< o:p>
run;< o:p>

/*Sum returns over each week to create weekly returns*/< o:p>
procsummary data=returns;< o:p>
var&stocks;< o:p>
byyear week;< o:p>
outputout=weekly(drop=_type_ _freq_) sum=;< o:p>
run;< o:p>

/*Copy weekly returns to perm library*/< o:p>
procdatasets lib=work nolist;< o:p>
copyout=dow30;< o:p>
selectweekly ;< o:p>
run;< o:p>
quit;< o:p>

/*Add a record index*/< o:p>
dataweekly(index=(indx));< o:p>
setweekly;< o:p>
indx = _N_;< o:p>
run;< o:p>

/*Select the index values for each quarter end, < o:p>
  starting with year end 1999, and ending with< o:p>
  year end 2011 */< o:p>
procsql noprint;< o:p>
selectindx, year, week< o:p>
into:starts separated by ‘ ‘< o:p>
fromweekly< o:p>
where(2012 > year > 1999< o:p>
            andweek in (13,26,39,52)< o:p>
         )< o:p>
  or (year = 1999and week = 52)< o:p>
orderby year, week;< o:p>
quit;< o:p>

%put&starts;< o:p>

/*Macro will loop over the starting index values < o:p>
  starts = period starting index values< o:p>
  obs = number of observations for copula fitting< o:p>
  fwd = number of periods forward to simulate< o:p>
  draws = number of draws for simulation< o:p>
  max = number of times to loop, m<1 means loop over all< o:p>
        values in starts */< o:p>
%macroloop(starts,obs=100,fwd=1,draws=100,max=0);< o:p>
/*Total draws needed in sumulation*/< o:p>
%letnDraws = %eval(&fwd*&draws);< o:p>
/*Number of items in starts*/< o:p>
%letnSims = %sysfunc(countw(&starts));< o:p>

/*update max with nSims if max < 1*/< o:p>
%if%sysevalf(&max < 1) %then < o:p>
      %letmax=&nSims;< o:p>
;< o:p>

/*Delete optES if it exists*/< o:p>
proc datasets lib=work nolist;< o:p>
delete optES;< o:p>
run;< o:p>
quit;< o:p>

/*Delete cov if it exists*/< o:p>
proc datasets lib=dow30 nolist;< o:p>
delete cov;< o:p>
run;< o:p>
quit;< o:p>


/*Start main loop*/< o:p>
%doi=1 %to &max;< o:p>
      /*start= current starting index*/< o:p>
      %letstart = %scan(&starts,&i);< o:p>
      /*e= ending index*/< o:p>
      %lete = %eval(&start – &obs);< o:p>

      /*create subset of weekly returns< o:p>
        get the as-of week and year*/< o:p>
      data test;< o:p>
      set weekly(where=(&e <= indx <=&start)) end=last;< o:p>
      if _n_ = 1then do;< o:p>
       call symput(“foy”,put(year,4.));< o:p>
         call symput(“fow”,put(week,2.));< o:p>
      end;< o:p>
    if last then do;< o:p>
         call symput(“aoy”, put(year,4.));< o:p>
         call symput(“aow”, put(week,2.));< o:p>
      end;< o:p>
      run;< o:p>

      /*de-mean the weekly returns over the subset*/< o:p>
      proc standard mean=0data=test out=test;< o:p>
      var &stocks;< o:p>
      run;< o:p>

      /*Create the covariance matrix for the subset, < o:p>
        store in perm location */< o:p>
      proc corr data=test < o:p>
            out=cov(where=(_type_=“COV”)) cov noprint;< o:p>
      var &stocks;< o:p>
      run;< o:p>

      data cov;< o:p>
      format year week 4.;< o:p>
      set cov;< o:p>
      year=&aoy;< o:p>
      week=&aow;< o:p>
      run;< o:p>

      proc append base=dow30.cov data=cov force;< o:p>
      run;< o:p>

      %putAS OF = &start;< o:p>
      %putFrom &foy, Week &fow: To &aoy, Week &aow;< o:p>

      ods select NONE ;< o:p>

      /*Fit a T copula and simulate from it*/< o:p>
      proc copula data=test;< o:p>
      var &stocks;< o:p>
      fit t /< o:p>
            marginals=empirical< o:p>
            method=MLE;< o:p>

      simulate /< o:p>
            ndraws=&nDraws< o:p>
            seed=54321< o:p>
            out=sim;< o:p>
      run;< o:p>

      /*Check the sim data.  If no observations then the < o:p>
        fitting failed.  Fail over to a Normal/Gaussian < o:p>
        Copula */< o:p>
      proc sql noprint;< o:p>
      select count(*) format=5.into :nsim from sim;< o:p>
      quit;< o:p>

      %if&nsim = 0 %then%do;< o:p>
            %putNSIM=&nsim, switching to NORMAL Model;< o:p>
            proc copula data=test ;< o:p>
            var &stocks;< o:p>
            fit normal / < o:p>
                  marginals=empirical;< o:p>

            simulate /< o:p>
                  ndraws=&nDraws< o:p>
                  seed=54321< o:p>
                  out=sim;< o:p>
            run;< o:p>
      %end;< o:p>
      ods select default;< o:p>

      /*Create an iteration number in the simulations*/< o:p>
      data sim_&aoy._%left(&aow);< o:p>
      format iter 8.;< o:p>
      set sim;< o:p>
      iter = mod(_N_,&draws);< o:p>
      run;< o:p>

      proc sort data=sim_&aoy._%left(&aow);< o:p>
      by iter;< o:p>
      run;< o:p>

      /*Aggregate the iterations into a final number for < o:p>
        the forward period*/< o:p>
      proc summary data=sim_&aoy._%left(&aow);< o:p>
      var &stocks;< o:p>
      by iter;< o:p>
      output out=sim_&aoy._%left(&aow)(drop=_type_ _freq_ iter) sum=;< o:p>
      run;< o:p>

      /*Use IML for the NL Optimization */< o:p>
      proc iml noprint;< o:p>

      /*Read the simulations into a matrix*/< o:p>
      use sim_&aoy._%left(&aow);< o:p>
      read all var _num_ into sim[colname=cnames];< o:p>
      close sim_&aoy._%left(&aow);< o:p>

      s = ncol(sim);< o:p>
      w0 = repeat(1/s,s);< o:p>

      /*Function calculating expected shortfall*/< o:p>
      start F_ES(x) global(sim);< o:p>
            v = sim*x`;< o:p>
            call sort(v);< o:p>
            cutoff = floor(nrow(sim)*.05);< o:p>
            es = v[1:cutoff][:];< o:p>
         return (-es);< o:p>
      finish F_ES;< o:p>

      /*Setup constraints*/< o:p>
      lb = repeat(0,1,s);< o:p>
      ub = repeat(.1,1,s);< o:p>
      ones = repeat(1,1,s);< o:p>
      addc = ones || {01};< o:p>

      con = (lb || {..}) // < o:p>
            (ub || {. .}) // < o:p>
            addc;< o:p>
      optn = {00}; < o:p>
      x = w0`;< o:p>

      /*Call the nl quasi-Newton optimizer*/< o:p>
      call nlpqn(rc,w,“F_ES”,x,optn,con);< o:p>

      /*Put the returns into a data set*/< o:p>
      create results_&aoy._%left(&aow) from w[colname=cnames];< o:p>
      append from w;< o:p>
      close results_&aoy._%left(&aow);< o:p>

      quit;< o:p>

      /*Add the week and year to the results*/< o:p>
      data results_&aoy._%left(&aow);< o:p>
      year = &aoy;< o:p>
      week = &aow;< o:p>
      set results_&aoy._%left(&aow);< o:p>
      run;< o:p>

      /*Append the results into the final results*/< o:p>
      proc append data=results_&aoy._%left(&aow) base=optES force;< o:p>
      run;< o:p>

%end;< o:p>


/*Delete the simulation and result files*/< o:p>
proc datasets lib=work nolist;< o:p>
delete sim: results:;< o:p>
run;< o:p>
quit;< o:p>

%mend;< o:p>


/*Call the simulations and optimizations< o:p>
  Use 104 weeks (2 years) in each subset< o:p>
  simulate 13 weeks (1 qtr) forward< o:p>
  draw 5000 iterations   */< o:p>
%letst = %sysfunc(datetime());< o:p>
optionsnonotes nomprint;< o:p>
*loop(starts,obs=100,fwd=1,draws=100,max=0);< o:p>
%loop(&starts,obs=104,fwd=13,draws=5000,max=0);< o:p>
optionsnotes nomprint;< o:p>
%letel = %sysevalf(%sysfunc(datetime()) – &st);< o:p>
%putTook: &el;< o:p>

/*Copy the final results and covariance matrices to < o:p>
  output locationthe */< o:p>
procdatasets lib=work nolist;< o:p>
copyout=dow30;< o:p>
selectoptES ;< o:p>
run;< o:p>
quit;< o:p>

/*Download DIA returns*/< o:p>
%get_stocks(dia spy,03JAN2000,30JUL2012,keepPrice=0);< o:p>

/*Finally, export the optimal portfolio weights< o:p>
  into R and save the DataFrame*/< o:p>
prociml;< o:p>
callExportDataSetToR(“dow30.optES”, “optES” );< o:p>
callExportDataSetToR(“dow30.cov”,“cov”);< o:p>
callExportDataSetToR(“dow30.prices”,“prices”);< o:p>
callExportDataSetToR(“dow30.prices_wk”,“prices_wk”);< o:p>
callExportDataSetToR(“returns”,“dia_spy”);< o:p>
submit/ R;< o:p>
      save(optES, file=“C:\\temp\\dow30\\optES”);< o:p>
      save(cov, file=“C:\\temp\\dow30\\cov”);< o:p>
      save(prices, file=“C:\\temp\\dow30\\prices”);< o:p>
      save(prices_wk, file=“C:\\temp\\dow30\\prices_wk”);< o:p>
      save(dia_spy, file=“C:\\temp\\dow30\\dia”);< o:p>
endsubmit;< o:p>
quit;< o:p>
The next post, I will post the R code to analyse the performance and compare it to a minimum variance portfolio, equal weight portfolio, cap weight portfolio, the Dow (DIA), and the S&P500 (SPY).

To leave a comment for the author, please follow the link and comment on their blog: Adventures in Statistical Computing.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.