Introducing Stepwise Correlation Rank

Ilya Kipnis

7 years ago

[This article was first published on QuantStrat TradeR » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

So in the last post, I attempted to replicate the Flexible Asset Allocation paper. I’d like to offer a thanks to Pat of Intelligent Trading Tech (not updated recently, hopefully this will change) for helping me corroborate the results, so that I have more confidence there isn’t an error in my code.

One of the procedures the authors of the FAA paper used is a correlation rank, which I interpreted as the average correlation of each security to the others.

The issue, pointed out to me in a phone conversation I had with David Varadi is that when considering correlation, shouldn’t the correlations the investor is concerned about be between instruments within the portfolio, as opposed to simply all the correlations, including to instruments not in the portfolio? To that end, when selecting assets (or possibly features in general), conceptually, it makes more sense to select in a stepwise fashion–that is, start off at a subset of the correlation matrix, and then rank assets in order of their correlation to the heretofore selected assets, as opposed to all of them. This was explained in Mr. Varadi’s recent post.

Here’s a work in progress function I wrote to formally code this idea:

stepwiseCorRank <- function(corMatrix, startNames=NULL, stepSize=1, bestHighestRank=FALSE) {
  if(is.null(startNames)) {
    corSums <- rowSums(corMatrix)
    corRanks <- rank(corSums)
    startNames <- names(corRanks)[corRanks <= stepSize]
  }
  nameList <- list()
  nameList[[1]] <- startNames
  rankList <- list()
  rankCount <- 1
  rankList[[1]] <- rep(rankCount, length(startNames))
  rankedNames <- do.call(c, nameList)
  
  while(length(rankedNames) < nrow(corMatrix)) {
    rankCount <- rankCount+1
    subsetCor <- corMatrix[, rankedNames]
    if(class(subsetCor) != "numeric") {
      subsetCor <- subsetCor[!rownames(corMatrix) %in% rankedNames,]
      if(class(subsetCor) != "numeric") {
        corSums <- rowSums(subsetCor)
        corSumRank <- rank(corSums)
        lowestCorNames <- names(corSumRank)[corSumRank <= stepSize]
        nameList[[rankCount]] <- lowestCorNames
        rankList[[rankCount]] <- rep(rankCount, min(stepSize, length(lowestCorNames)))
      } else { #1 name remaining
        nameList[[rankCount]] <- rownames(corMatrix)[!rownames(corMatrix) %in% names(subsetCor)]
        rankList[[rankCount]] <- rankCount
      }
    } else {  #first iteration, subset on first name
      subsetCorRank <- rank(subsetCor)
      lowestCorNames <- names(subsetCorRank)[subsetCorRank <= stepSize]
      nameList[[rankCount]] <- lowestCorNames
      rankList[[rankCount]] <- rep(rankCount, min(stepSize, length(lowestCorNames)))
    }    
    rankedNames <- do.call(c, nameList)
  }
  
  ranks <- do.call(c, rankList)
  names(ranks) <- rankedNames
  if(bestHighestRank) {
    ranks <- 1+length(ranks)-ranks
  }
  ranks <- ranks[colnames(corMatrix)] #return to original order
  return(ranks)
}

So the way the function works is that it takes in a correlation matrix, a starting name (if provided), and a step size (that is, how many assets to select per step, so that the process doesn’t become extremely long when dealing with larger amounts of assets/features). Then, it iterates–subset the correlation matrix on the starting name, and find the minimum value, and add it to a list of already-selected names. Next, subset the correlation matrix columns on the selected names, and the rows on the not selected names, and repeat, until all names have been accounted for. Due to R’s little habit of wiping out labels when a matrix becomes a vector, I had to write some special case code, which is the reason for two nested if/else statements (the first one being for the first column subset, and the second being for when there’s only one row remaining).

Here’s a test script I wrote to test this function out:

#mid 1997 to end of 2012
getSymbols(mutualFunds, from="1997-06-30", to="2012-12-31")
tmp <- list()
for(fund in mutualFunds) {
  tmp[[fund]] <- Ad(get(fund))
}

#always use a list hwne intending to cbind/rbind large quantities of objects
adPrices <- do.call(cbind, args = tmp)
colnames(adPrices) <- gsub(".Adjusted", "", colnames(adPrices))

adRets <- Return.calculate(adPrices)

subset <- adRets["2012"]
corMat <- cor(subset)


tmp <- list()
for(i in 1:length(mutualFunds)) {
  rankRow <- stepwiseCorRank(corMatrix, startNames=mutualFunds[i])
  tmp[[i]] <- rankRow
}
rankDemo <- do.call(rbind, tmp)
rownames(rankDemo) <- mutualFunds
origRank <- rank(rowSums(corMat))
rankDemo <- rbind(rankDemo, origRank)
rownames(rankDemo)[8] <- "Original (VBMFX)"

heatmap(-rankDemo, Rowv=NA, Colv=NA, col=heat.colors(8), margins=c(6,6))

Essentially, using the 2012 year of returns for the 7 FAA mutual funds, I compared how different starting securities changed the correlation ranking sequence.

Here are the results:

               VTSMX FDIVX VEIEX VFISX VBMFX QRAAX VGSIX
VTSMX              1     6     7     4     2     3     5
FDIVX              6     1     7     4     2     5     3
VEIEX              6     7     1     4     2     3     5
VFISX              2     6     7     1     3     4     5
VBMFX              2     6     7     4     1     3     5
QRAAX              5     6     7     4     2     1     3
VGSIX              5     6     7     4     2     3     1
Non-Sequential     5     6     7     2     1     3     4

In short, the algorithm is rather robust to starting security selection, at least judging by this small example. However, comparing VBMFX start to the non-sequential ranking, we see that VFISX changes from rank 2 in the non-sequential to rank 4, with VTSMX going from rank 5 to rank 2. From an intuitive perspective, this makes sense, as both VBMFX and VFISX are bond funds, which have a low correlation with the other 5 equity-based mutual funds, but a higher correlation with each other, thus signifying that the algorithm seems to be working as intended, at least insofar as this small example demonstrates. Here’s a heatmap to demonstrate this in visual form.

The ranking order (starting security) is the vertical axis, and the horizontal are the ranks, from white being first, to red being last. Notice once again that the ranking orders are robust in general (consider each column of colors descending), but each particular ranking order is unique.

So far, this code still has to be tested in terms of its applications to portfolio management and asset allocation, but for those interested in such an idea, it’s my hope that this provides a good reference point.

Thanks for reading.

To leave a comment for the author, please follow the link and comment on their blog: QuantStrat TradeR » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.