Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
A simple portfolio optimization problem is used to look at several R functions that use randomness in various ways to do optimization.
Orientation
Some optimization problems are really hard. In these cases sometimes the best approach is to use randomness to get an approximate answer.
Once you decide to go down this route, you need to decide on two things:
- how to formulate the problem you want to solve
- what algorithm to use
These decisions should seldom be taken independently. The best algorithm may well depend on the formulation, and how you formulate the problem may well depend on the algorithm you use.
The heuristic algorithms we will look at mostly fall into three broad categories:
- simulated annealing
- traditional genetic algorithm
- evolutionary algorithms
Genetic algorithms and evolutionary algorithms are really the same thing, but have different ideas about specifics.
The Portfolio Probe optimization algorithm is a blend of these traditions plus additional techniques.
The test case
The problem is to maximize a mean-variance utility where the universe is 10 assets and we have the constraints that the portfolio is long-only (weights must be non-negative), the weights must sum to 1, and there can be at most 5 assets in the portfolio.
In terms of portfolio optimization this is a tiny and overly trivial problem. Portfolio Probe solves this problem consistently to 6 decimal places in about the same time as the algorithms tested here.
Actually there are two problems. The variance matrix is the same in both but there are two expected return vectors. In one the optimal answer contains only 3 assets so the integer constraint of at most 5 is non-binding. In the other case the integer constraint is binding.
Formulation
What we really want is a vector of length 10 with non-negative numbers that sum to 1 and at most 5 positive numbers. The tricky part is how to specify which five of the ten are to be allowed to be positive.
The solution used here is to optimize a vector that is twice as long as the weight vector — 20 in this case. The second half of the vector holds the weights (which are not normalized to sum to 1). The first half of the vector holds numbers that order the assets by their desirability to be in the portfolio. So the assets with the five largest numbers in this first half are allowed to have positive weights.
The first half of the solution vector tells us which assets are to be included in the portfolio. Then the weight vector is prepared: it is extracted from the solution vector, the weights for assets outside the portfolio are set to zero, and the weights are normalized to sum to 1.
The original intention was that all the numbers in the solution vector should be between 0 and 1. However, not all of the optimizers support such constraints. The constraint of being less than 1 is purely arbitrary anyway. We’ll see an interesting result related to this.
The optimizers
Here are the R packages or functions that appear. If you are looking for optimization routines in R, then have a look at the optimization task view.
Rmalschains package
The Rmalschains package has the malschains
function. The name stands for “memetic algorithm with local search chains”. I haven’t looked but I suspect it has substantial similarities with genopt
.
GenSA package
The GenSA package implements a generalized simulating annealing.
genopt function
The genopt
function is the horse that I have in the race. It is not in a package, but you can source the genopt.R file to use it. You can get a sense of how to use it from S Poetry. The line of thinking that went into it can be found in “An Introduction to Genetic Algorithms”.
DEoptim package
The DEoptim package implements a differential evolutionary algorithm.
soma package
The soma package gives us a self-organizing migrating algorithm.
rgenoud package
The rgenoud package implements an algorithm that combines a genetic algorithm and derivative-based optimization.
GA package
The GA package is a reasonably complete implementation in the realm of genetic algorithms.
NMOF package
The NMOF package contains a set of functions that are introductory examples of various algorithms. This package is support for the book Numerical Methods and Optimization in Finance.
The optimizers that this package contributes to the race are:
- DEopt — another implementation of the differential evolutionary algorithm
- LSopt — local stochastic search, which is very much like simulated annealing
- TAopt — threshold accepting algorithm, another relative of simulated annealing
SANN method of optim function
optim(method="SANN", ...)
does a simulated annealing optimization.
The results
Each optimizer was run 100 times on each of the two problems. The computational time and the utility achieved was recorded for each run. One or more control parameters were adjusted so that the typical run took about a second on my machine (which is about 3 years old and running Windows 7).
The figures show the difference in utilities between the runs and the optimal solution as found by Portfolio Probe. The optimizers are sorted by the median deficiency.
Figure 1: Difference in utility from optimal for all optimizers on the non-binding problem.
Figure 2: Difference in utility from optimal for all optimizers on the binding problem. GenSA
— which hails from simulated annealing land — does very well.
I’m guessing that genoud
would have done better if the differentiation were applied only to the weights and not the first part of the solution vector.
The other thing of note is that DEoptim
is a more robustly developed version of differential evolution than is DEopt
. However, DEopt
outperforms DEoptim
. DEopt
does not have box constraints, so its solution vectors grow in size as the algorithm progresses. This seems to make the problem easier. A weakness of DEopt
turned out to be a strength.
Figures 3 and 4 show the results for the six best optimizers — same picture, different scale.
Figure 3: Difference in utility from optimal for the best optimizers on the non-binding problem.
Figure 4: Difference in utility from optimal for the best optimizers on the binding problem.
Caveat
You should never (ever) think that you understand the merits of optimizers from one example.
I have no doubt that very different results could be obtained by modifying the control parameters of the optimizers. In particular the results are highly dependent on the time allowed. Some optimizers will be good at getting approximately right but not good at homing in on the exact solution — these will look good when little time is allowed. Other algorithms will be slow to start but precise once they are in the neighborhood — these will look good when a lot of time is allowed.
For genetic and evolutionary algorithms there is a big interaction between time allowed and population size. A small population will get a rough approximation fast. A large population will optimize much slower but (generally) achieve a better answer in the end.
Exact circumstances are quite important regarding which optimizer is best.
Summary
If your problem is anything like this problem, then the Rmalschains
and GenSA
packages are worth test driving.
Appendix R
The functions that ran the optimizers plus the code and data for the problems are in heuristic_objs.R. (The 10a problem is non-binding and 10s is binding.)
In case you want to test routines on these problems outside R: the variance is in variance10.csv and the two expected return vectors are in expectedreturns10.csv.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.