Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
So, first off: I just finished a Thinkful data science in python bootcamp program that was supposed to take six months, in about four months. All of my capstone projects I applied to volatility trading; long story short, none of the ML techniques worked, and the more complex the technique I tried, the worse it performed. Is there a place for data science in Python in the world? Of course. Some firms swear by it. However, R currently has many more libraries developed specifically for quantitative finance, such as PerformanceAnalytics, quantstrat, PortfolioAnalytics, and so on. Even for more basic portfolio management tasks, I use functions such as Return.Portfolio and charts.PerformanceSummary in R, the equivalent for which I have not seen in Python. While there are some websites with their own dialects built on top of Python, such as quantConnect and quantopian, I think that’s more their own special brand of syntax, as opposed to being able to create freeform portfolio backtesting strategies from pandas.
In any case, here’s my Python portfolio from the bootcamp I completed. The fact that Yahoo’s data ingestion broke on the SHORTVOL index means that the supervised and unsupervised notebooks need their data input replaced by the one in the final capstone project. You can look at the notebooks to see exactly what I tried, but to cut to the chase, none of the techniques worked. Random forests, SVMs, XG boosting, UMAP…they don’t really apply to predicting returns. The features I used were those I use in my own trading strategy, at least some of them, so it wasn’t a case of “garbage in, garbage out”. And the more advanced the technique, the worse the results. In the words of one senior quant trading partner: “Auto-ML = auto-bankrupt”. So when people say “we use AI and machine learning to generate superior returns”, they’ve either found something absolutely spectacular (highly unlikely), or are just using the latest hype terms. After all, even linear regression can be thought of as a learning model.
Even taking PCAs of various term structure features did a worse job than my base volatility trading strategy. Of course, it’s gotten better since then as I added more risk management to the strategy, and caught a nice chunk of the coronavirus long vol move in March. You can subscribe to it here.
So yes, I code in Python now (if the previous post wasn’t any indication, so those who need some Python development for quant work, if it uses the usual numpy/scipy/pandas stack, feel free to reach out to me).
Anyway, this post is about adding some Corey Hoffstein style analysis to asset allocation strategies, this time in R, because this is a technique I used for a very recent freelance project for an asset allocation firm that I currently freelance for (off and on). I call it Corey Hoffstein style, because on twitter, he’s always talking about analyzing the impact of timing luck. His blog at Newfound Research is terrific for thinking about elements one doesn’t see in many other places, such as analyzing trend-following strategies in the context of option payoffs, the impact of timing luck and various parameters of lookback windows, and so on.
The quick idea is this: when you rebalance a portfolio every month, you want to know how changing the various trading day affects your results. This is what Walter does over at AllocateSmartly.
But a more interesting question is what happens when a portfolio is rebalanced on longer timeframes–that is, what happens when you rebalance a portfolio only once a quarter, once every six months, or once a year? What if instead of rebalancing quarterly on January, April, and so on, you rebalance instead on February, May, etc.?
This is a piece of code (in R, so far) that does exactly this:
offset_monthly_endpoints <- function(returns, k, offset) { # because the first endpoint is indexed to 0 and is the first index, add 1 to offset mod_offset = (offset+1)%%k # make sure we don't have 7 month offset on 6 month rebalance--that's just 1. eps <- endpoints(returns, on = 'months') # get monthly endpoints indices <- (1:length(eps)) # create indices from 1 to number of endpoints selected_eps <- eps[indices%%k == mod_offset] # only select endpoints that have proper offset when modded by k selected_eps <- unique(c(0, selected_eps, nrow(returns))) # append start and end of data return(selected_eps) }
Essentially, the idea behind this function is fairly straightforward: given that we want to subset on monthly endpoints at some interval (that is, k = 3 for quarterly, k = 6 for every 6 months, k = 12 for annual endpoints), we want to be able to offset those by some modulo, we use a modulo operator to say “hey, if you want to offset by 4 but rebalance every 3 months, that’s just the same thing as offsetting by 1 month”. One other thing to note is that since R is a language that starts at index 1 (rather than 0), there’s a 1 added to the offset, so that offsetting by 0 will get the first monthly endpoint. Beyond that, it’s simply creating an index going from 1 to the length of the endpoints (that is, if you have around 10 years of data, you have ~120 monthly endpoints), then simply seeing which endpoints fit the criteria of being every first, second, or third month in three.
So here’s how it works, with some sample data:
require(quantmod) require(PerformanceAnalytics) getSymbols('SPY', from = '1990-01-01') > head(SPY[offset_monthly_endpoints(Return.calculate(Ad(SPY)), 3, 1)]) SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted 1993-01-29 43.96875 43.96875 43.75000 43.93750 1003200 26.29929 1993-04-30 44.12500 44.28125 44.03125 44.03125 88500 26.47986 1993-07-30 45.09375 45.09375 44.78125 44.84375 75300 27.15962 1993-10-29 46.81250 46.87500 46.78125 46.84375 80700 28.54770 1994-01-31 48.06250 48.31250 48.00000 48.21875 313800 29.58682 1994-04-29 44.87500 45.15625 44.81250 45.09375 481900 27.82893 > head(SPY[offset_monthly_endpoints(Return.calculate(Ad(SPY)), 3, 2)]) SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted 1993-02-26 44.43750 44.43750 44.18750 44.40625 66200 26.57987 1993-05-28 45.40625 45.40625 45.00000 45.21875 79100 27.19401 1993-08-31 46.40625 46.56250 46.34375 46.56250 66500 28.20059 1993-11-30 46.28125 46.56250 46.25000 46.34375 230000 28.24299 1994-02-28 46.93750 47.06250 46.81250 46.81250 333000 28.72394 1994-05-31 45.73438 45.90625 45.65625 45.81250 160000 28.27249 > head(SPY[offset_monthly_endpoints(Return.calculate(Ad(SPY)), 3, 3)]) SPY.Open SPY.High SPY.Low SPY.Close SPY.Volume SPY.Adjusted 1993-03-31 45.34375 45.46875 45.18750 45.18750 111600 27.17521 1993-06-30 45.12500 45.21875 45.00000 45.06250 437600 27.29210 1993-09-30 46.03125 46.12500 45.84375 45.93750 99300 27.99539 1993-12-31 46.93750 47.00000 46.56250 46.59375 312900 28.58971 1994-03-31 44.46875 44.68750 43.53125 44.59375 788800 27.52037 1994-06-30 44.82812 44.84375 44.31250 44.46875 271900 27.62466
Notice how we get different quarterly rebalancing end dates. This also works with semi-annual, annual, and so on. The one caveat to this method, however, is that when doing tactical asset allocation analysis in R, I subset by endpoints. And since I usually use monthly endpoints in intervals of one (that is, every monthly endpoint), it’s fairly simple for me to incorporate measures of momentum over any monthly lookback period. That is, 1 month, 3 month, etc. are all fairly simple when rebalancing every month. However, for instance, if one were to rebalance every quarter, and take only quarterly endpoints, then getting a one-month momentum measure every quarter would take a bit more work, and if one wanted to do quarterly rebalancing, tranche it every month, but also not simply rebalance at the end of the month, but rebalanace multiple times *throughout* the month, that would require even more meticulousness.
However, one sort of second, “kludge-y” method to go about this, would be to run the backtest to find all the weights, and then apply a similar coding methodology to the *weights*. For instance, if you have a time series of monthly weights, just create an index ranging from 1 to the length of the weights, then depending on how often you want to rebalance, subset for every mod 3 == 0, 1, or 2. More generally, if you rebalance once every k months, you create an index ranging from 1 to the length of your index if the language is base 1 (R), or 0 to length n-1, if Python. Then, you simply see which indices give a remainder of 0 to k-1 when taking the modulo K, and that’s it. This will allow you to get k different rebalancing tranches by taking the indices of those endpoints. And you can still offset those endpoints daily as well. The caveat here, of course, is that you need to run the backtest for all of the individual months, and if you have a complex optimization routine, this can take an unnecessarily long time. So which method you use depends on the task at hand. This second method, however, is what I would use as a wrapper to a monthly rebalancing algorithm that already exists, such as my KDA asset allocation algorithm.
That’s it for this post. In terms of things I want to build going forward: I’d like to port over some basic R functionality to Python, such as Return.Portfolio, and charts.PerformanceSummary, and once I can get that working, to demonstrate how to do a lot of the same asset allocation work I’ve done in R…in Python, as well.
Thanks for reading.
NOTE: I am currently searching for a full-time role to make use of my R (and now Python) skills. If you are hiring, or know of someone that is, don’t hesitate to reach out to me on my LinkedIn.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.