ETF Tracking Error Minimization using R code
[This article was first published on K & L Fintech Modeling, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This post explains how to construct ETF tracking error (TE) minimization and introduce R packages which perform (sparse) index tracking. ETF (Exchange Traded Fund) is a traded fund listed on the exchange. ETF tries to mimic or follow a target benchmark index (BM) such as S&P500. This is called the tracking error (TE) minimization. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
ETF Tracking Error Optimization using R code
Index Tracking
ETF select a small number or subset of constituents of stock or bond index to mimic BM index. Since ETF does not contain all constituents of BM index (full replication), tracking error (TE) take places. Furthermore, optimal subset is not fixed but variable according to the market development so that frequently rebalancing is required.
The number of constituents of BM index is so large that the full replication is impossible due to the transaction costs and liquidity problem. Therefore, Index tracking is finding the optimal combination of subset securities for minimizing tracking errors and its objective function is formulated as follows.
\[\begin{align} TE = \frac{1}{T} \sum_{t=1}^{T} \left( \sum_{i=1}^{N} \left( w_i r_{it} – R_t^I \right)^2 \right) \end{align}\] Here, \(R_t^I\) adn \(r_{it}\) are time \(t \) returns of BM index and its constituents respectively and \(w_i\) is the weight of \(i\) constituent.
Using vector-matrix notation, the above problem is reformulated with its constraints as follows. \[\begin{align} &\min_{w} \frac{1}{T} || Rw – R^I ||_2^2 \\ \text{subject to}& \\ &e^T w = 1 \\ &\eta_i Z_i \leq w_i < Z_i \delta_i \\ &\sum_{t=1}^{N} Z_i = K \\ &Z_i = 0 \quad or \quad 1, \quad i=1,2,...,N \end{align}\] Here, \(N\) is the number of constituents of BM index and \(K\) is the number of constituents of ETF. \(R^I=(R_1^I,R_2^I,…,R_T^I )^T\) is a \(T×1\) vector of BM index return and \(R=(R_1,R_2,…,R_T)\) is a \(T×N\) matrix which is concatenated with all \(T×1\) vector of \(R_i=(r_i1,r_i2,…,r_iT )^T\) horizontally. \(w=(w_1,w_2,…,w_N )^T\) is a \(T×1\) vector of allocation weights.
Seeing the above constraints, first condition is so called budget constraint which means all capital is invested into ETF portfolio. Second condition denote the lower and upper bound for allocation weights. Third condition is a cardinality constraints that \(Z_i\) may take on 0 or 1 and sum of it is \(K\). This constraints means only \(K\) securities from all \(N\) are invested.
But this problem is considered a difficult problem because cardinality constraints make this NP hard problem, in other words, \(\sum_{t=1}^{N} Z_i = K\) make this problem highly dimensional discrete problem.. This means only when we calculate all combinations by using mixed integer programming, we can select the optimal combination. But the number of combination is too large to calculate it. For this reason, this problem is also called the sparse index tracking problem. Of course, recently Fengmin, Xu, and Xue (2015) suggest \(L_{1/2}\) Regularization for this problem.
For this post, we use sparseIndexTracking R package for sparse index tracking and also use ROI.plugin.ecos R package for index tracking and finally compare these two results.
Second-order conic programming (SOCP)
For index tracking, we use ROI and ROI.plugin.ecos. In particular, ROI.plugin.ecos provide the solver for the second-order conic programming (SOCP).
What is a SOCP and what is the relationship between SOCP and index tracking?
Second-order cone programming (SOCP) problems are convex optimization problems in which a linear function is minimized over the intersection of an affine linear manifold with the Cartesian product of second-order cones.
Index tracking problem is typically rewritten into SOCP format and ROI.plugin.ecos or other index tracking solver need SOCP format as input format. Therefore we need to transform our index tracking errors minimization problem into second-order conic programming problem.
We present the original and transform problem. You can easily find the concept of SOCP in the context of index tracking problem.
For example, we try to mimic the benchmark index by minimizing tracking error. TE problem is as follows.
\[\begin{align} &\min_{w} \sqrt{\sum_{t=1}^{T} \left( \sum_{i=1}^{N} \left( R_t^I – w_i r_{it} \right)^2 \right)} \\ \text{subject to}& \\ &e^T w = 1 \\ &w > 0 \\ \end{align}\]
Here, \(w = (w_1 , w_2 , …, w_N) \) and \(r = (r_1, r_2, …, r_N) \).
\[\begin{align} &\min_{w} t \\ \text{subject to}& \\ &\sqrt{\sum_{t=1}^{T} \left( \sum_{i=1}^{N} \left( R_t^I – w_i r_{it} \right)^2 \right)} \ge t \\ &e^T w = 1+t \\ &w > 0 \\ \end{align}\]
Here, \(w = (w_1 , w_2 , …, w_N, t) \) and \(r = (r_1, r_2, …, r_N, 1) \).
It is worth noting that definitions of \(w\) and \(r\) are different between two equations. The second equation also include \(t\) as a control variable. Second equation treats the first equation’s objective function as an additional constraint. For convenience, two equations omit \(\frac{1}{T}\) since it is a constant and use a square root for formal expression.
Although the definition of SOCP seems somewhat difficult, we can easily observe the characteristics of SOCP from the above two formulation. The bottom line is that convex objective function can be transformed into a constraint and an objective function is replaced by a linear function.
R package
Using ROI and ROI.plugin.ecos, we can perform the index tracking minimization. But this case, since there is no cardinality constraints, we need to select the subset of securities. But sparseIndexTracking R package implements this cardinality constraints by adjusting the regularization parameter (\(\lambda\)). The higher the \(\lambda\), the more the coefficients are shrinked towards zero.
R code
The following R code implements two index tracking problems. We use data which is embedded in sparseIndexTracking R package. For expositional purpose, we assume the universe of stock consisted of 30 because it is difficult to demonstrate the results as a table or figure when using all 386 stocks. But after understanding the main contents, we also deal the 386 case.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | #============================================================== # Financial Econometrics & Derivatives, ML/DL # using R, Python, Keras, Tensorflow # by Sang-Heon Lee # # https://kiandlee.blogspot.com #————————————————————– # Index Tracking Error Minimization # using ROI.ecos and sparseIndexTracking #============================================================== graphics.off() # clear all graphs rm(list = ls()) # remove all files from your workplace library(sparseIndexTracking) library(ROI) library(ROI.plugin.ecos) #———————————————— # Data #———————————————— # load stock index data data(INDEX_2010) y = as.vector(INDEX_2010$SP500) X = as.matrix(INDEX_2010$X) # comment it when full data is used X <– X[,1:30] nobs = length(y); nX = ncol(X) #———————————————— # 1) Using ROI and ROI.ecos #———————————————— # w = c( w1, w2, w3, t)’ # Xn = c(Xn1, Xn2, Xn3, 1) # # min sqrt( (y1 – X1’*w)^2 + (y2 – X2’*w)^2 # + (y3 – X3’*w)^2 + (y4 – X4’*w)^2 # + (y5 – X5’*w)^2 # ) # s.t. # w1 + w2 + w3 = 1 # w1, w2, w3 > 0 # –> Rewritten into the standard form # # minimize t # s.t. # sqrt( (y1 – X1’*w)^2 + (y2 – X2’*w)^2 # + (y3 – X3’*w)^2 + (y4 – X4’*w)^2 # + (y5 – X5’*w)^2 # ) <= t # w1 + w2 + w3 = 1 # w1, w2, w3 > 0 #———————————————— # Index tracking error minimization # using second order cone programming #———————————————— A <– rbind(c( rep(0,nX), –1), cbind(X,0)) soc <– OP(objective = L_objective(c(rep(0,nX), 1)), constraints = c( C_constraint(A, K_soc(nobs+1), c(0,y)), L_constraint(c(rep(1,nX), 0), “==”, 1)) ) soc_sol <– ROI_solve(soc, solver = “ecos”) wgt_roi <– soc_sol$solution[1:nX] #———————————————— # 2) Using sparseIndexTracking #———————————————— # fit portfolio under error measure ETE # (Empirical Tracking Error) # Unconstrained wgt_sps <– spIndexTrack(X, y, lambda = 1e–180, u = 1, measure = ‘ete’, thres = 1e–180) # Constrained # wgt_sps <- spIndexTrack(X, y, lambda = 1e-7, # u = 1, measure = ‘ete’) #———————————————— # 3) Comparison for allocation weights #———————————————— round(cbind(wgt_roi, wgt_sps),4) | cs |
With arguments for unconstrained parameters (\(\lambda=1e-180\) and subset of stocks \(n=30\), Running the above R code results in the following weight allocations of two R package: ROI with ROI.plugin.ecos and sparseIndexTracking.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | > #———————————————— > # 3) Comparison for allocation weights > #———————————————— > > round(cbind(wgt_roi, wgt_sps),4) wgt_roi wgt_sps 1436513D UN Equity 0.0270 0.0270 1500785D UN Equity 0.0220 0.0220 1518855D US Equity 0.0319 0.0319 9876566D UN Equity 0.0607 0.0607 A UN Equity 0.0149 0.0149 AA UN Equity 0.0426 0.0426 AAPL UW Equity 0.0444 0.0444 ABC UN Equity 0.0151 0.0151 ABT UN Equity 0.1330 0.1330 ADBE UW Equity 0.0114 0.0114 ADM UN Equity 0.0127 0.0127 ADP UW Equity 0.1440 0.1440 ADSK UW Equity 0.0113 0.0113 AEE UN Equity 0.0453 0.0453 AEP UN Equity 0.0158 0.0159 AES UN Equity 0.0074 0.0074 AET UN Equity 0.0132 0.0132 AFL UN Equity 0.0413 0.0413 AGN UN Equity 0.0145 0.0146 AIG UN Equity 0.0002 0.0002 AIV UN Equity 0.0452 0.0452 AIZ UN Equity 0.0202 0.0202 AKAM UW Equity 0.0000 0.0000 ALL UN Equity 0.0348 0.0348 ALTR UW Equity 0.0172 0.0172 AMAT UW Equity 0.0336 0.0336 AMGN UW Equity 0.0411 0.0411 AMP UN Equity 0.0503 0.0503 AMT UN Equity 0.0437 0.0437 AMZN UW Equity 0.0051 0.0051 | cs |
For the sparse index tracking, with arguments for unconstrained parameters (\(\lambda=1e-6\) and subset of stocks \(n=30\), Running the above R code results in the following weight allocations of two R package: ROI with ROI.plugin.ecos and sparseIndexTracking. We can easily find that the sparse index tracking demonstrates the selection effect.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | > #———————————————— > # 3) Comparison for allocation weights > #———————————————— > > round(cbind(wgt_roi, wgt_sps),4) wgt_roi wgt_sps 1436513D UN Equity 0.0270 0.0397 1500785D UN Equity 0.0220 0.0000 1518855D US Equity 0.0319 0.0379 9876566D UN Equity 0.0607 0.0656 A UN Equity 0.0149 0.0000 AA UN Equity 0.0426 0.0445 AAPL UW Equity 0.0444 0.0510 ABC UN Equity 0.0151 0.0000 ABT UN Equity 0.1330 0.1598 ADBE UW Equity 0.0114 0.0000 ADM UN Equity 0.0127 0.0000 ADP UW Equity 0.1440 0.1783 ADSK UW Equity 0.0113 0.0000 AEE UN Equity 0.0453 0.0652 AEP UN Equity 0.0158 0.0000 AES UN Equity 0.0074 0.0000 AET UN Equity 0.0132 0.0000 AFL UN Equity 0.0413 0.0473 AGN UN Equity 0.0145 0.0000 AIG UN Equity 0.0002 0.0000 AIV UN Equity 0.0452 0.0543 AIZ UN Equity 0.0202 0.0000 AKAM UW Equity 0.0000 0.0000 ALL UN Equity 0.0348 0.0418 ALTR UW Equity 0.0172 0.0000 AMAT UW Equity 0.0336 0.0507 AMGN UW Equity 0.0411 0.0499 AMP UN Equity 0.0503 0.0595 AMT UN Equity 0.0437 0.0543 AMZN UW Equity 0.0051 0.0000 | cs |
The two figures below show the weight allocations of two cases. When there is no regularization for cardinality constraint, two results are same.
When there is a regularization for cardinality constraint, two results are different since sparse index tracking select a subset of securities from 30 universe.
When we use all 386 securities, the folloiwng two figures are obtained.
In the above case of all data, we can observe some discrepancies in allocation weights but overall distribution of weights are similar. As variables are too many, some numerical error is largely cumulated.
But for more precise calculations, we think that investigations with hyperparameters (\lambda and so on) varying are also needed.
These two approaches are complementary because sparse index tracking does not consider economically significant variables but statistically significant variables. \(\blacksquare\)
To leave a comment for the author, please follow the link and comment on their blog: K & L Fintech Modeling.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.