Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The most popular models for analysing the risk of portfolios are factor models, since stocks have a tendency to move together. The principal component of securities often explains a large share of it’s variance. Since we are mostly concerned with multiple assets which form a portfolio we need to account for this. Some questions might be why do stocks with low Price to Book ratios outperform stocks with higher Price to Book ratios? Here the “Price” part of the ratio is simply the share price (per share) and the “Book” part of the ratio is the “Shareholders Equity” / “Shares Outstanding” which are items found on a companies Balance Sheet.
Another important reason Factor models are popular are dimensionality. Assume \(N\) assets which subsequently has \(N\) variances and \(N(N-1)/2\) correlations. If we have a factor model with \(F\) factors, it has \(N\) idiosyncratic variances, \(F\) factor variances and \(NF\) betas. As long as:
\[2F < N(N-1)/(N+1)\]
We have fewer parameters with factor models.
The Capital Asset Pricing Model Sharpe (1964) is the simplest factor model consisting of a single factor.
\[r_i = a_i + b_if + e_i\]
The mean-variance parameters are:
\[\overline{r}_i = a_{i} + b_{i}\overline{y}\] \[\sigma_{i}^2 = b_{i}^2\sigma_{\epsilon_{i}}^2\]
\[\sigma_{ij} = b_{i}b_{j}\sigma_{y}^2\]
The \(Cov(r_{i}, y) = Cov(a_{i} + b_{i}y + e_{i}, y) = b_{i}\sigma_{y}^2\) and therefore \(b_{i}\) is;
\[b_{i} = \dfrac{Cov(r_{i}, y)}{\sigma_{y}^2}\]
This is alternatively expressed as;
\[b_{i} = Cov(r_{i}, y) / Var(y)\]
Where \(b\) is the slope of the regression line which minimises the squared distance of returns and \(a\) is the intercept or \(alpha\) which are expressed on a line \(a + bf\).
I will go through these calculations using Base R functions, however first we need some data and for this I like the tidy functions which require a few packages.
I download daily price data from Yahoo Finance using quantmod
or tidyquant
’s wrapper to the quantmod
package. The difference here being that quantmod
collects data and stores it as an xts
object and tidyquant
collects the data and stores it as a tibble
, from here we are able to more easily use functions from the tidyverse
way of handling data, we convert the data back to an xts
object using the tk_xts
function from the timetk
package.
The firms are Apple, Harley Davidson, Nvidia, Microsoft, AMD, Ford, General Electric, 3M, Intel and Amazon.
library(quantmod) library(ggplot2) library(tidyquant) library(tidyr) library(dplyr) library(timetk) library(tibble) library(rvest) library(corrplot) library(stringr) start_date <- "2017-01-01" end_date <- "2019-01-01" symbols <- c("AAPL", "HOG", "NVDA", "MSFT", "AMD", "F", "GE", "MMM", "INTL", "AMZN") portfolio_prices <- tq_get( symbols, from = start_date, to = end_date ) %>% select(symbol, date, adjusted) %>% pivot_wider(names_from = symbol, values_from = adjusted) %>% tk_xts(date_var = date)
The data looks like the following, I removed the Open
, High
, Low
, Close
and Volume
data and kept just the Adjusted
prices, where each asset is it’s own column, the data has been converted into a time series object or an xts
object where the date
is stored as the index (or rownames) and is not visable in the table below.
AAPL | HOG | NVDA | MSFT | AMD | F | GE | MMM | INTL | AMZN |
---|---|---|---|---|---|---|---|---|---|
110.9539 | 53.59669 | 101.0116 | 59.49657 | 11.43 | 10.43042 | 28.13062 | 165.3643 | 40.41 | 753.67 |
110.8297 | 54.18777 | 103.3683 | 59.23036 | 11.43 | 10.91093 | 28.13950 | 165.6151 | 40.86 | 757.18 |
111.3933 | 54.24233 | 100.7443 | 59.23036 | 11.24 | 10.57954 | 27.97971 | 165.0485 | 38.77 | 780.45 |
112.6351 | 53.74219 | 102.0910 | 59.74376 | 11.32 | 10.57126 | 28.05961 | 165.5315 | 38.71 | 795.99 |
113.6668 | 52.87832 | 106.2301 | 59.55362 | 11.49 | 10.46355 | 27.92646 | 164.6399 | 38.33 | 796.92 |
113.7815 | 53.19659 | 105.4280 | 59.53460 | 11.44 | 10.64582 | 27.84656 | 163.9991 | 38.46 | 795.90 |
I also collect the S&P500 data using the same method:
SPY <- tq_get( "^GSPC", from = start_date, to = end_date ) %>% select(date, adjusted) %>% tk_xts(date_var = date)
Which looks similar to the individual asset prices.
adjusted |
---|
2257.83 |
2270.75 |
2269.00 |
2276.98 |
2268.90 |
2268.90 |
We can plot the data using the chartSeries
function.
chartSeries(SPY)
I compute the daily log returns for all the assets in the portfolio_returns
and the SPY
returns
portfolio_returns <- diff(log(portfolio_prices), na.pad = FALSE) SPY_returns <- diff(log(SPY), na.pad = FALSE)
The returns data for each asset looks like:
AAPL | HOG | NVDA | MSFT | AMD | F | GE | MMM | INTL | AMZN |
---|---|---|---|---|---|---|---|---|---|
-0.001119731 | 0.010967925 | 0.023062871 | -0.0044844490 | 0.000000000 | 0.0450387901 | 0.0003155849 | 0.001515279 | 0.011074335 | 0.004646413 |
0.005072384 | 0.001006252 | -0.025713094 | 0.0000000000 | -0.016762633 | -0.0308429974 | -0.0056944999 | -0.003426713 | -0.052504862 | 0.030269695 |
0.011086527 | -0.009263191 | 0.013278832 | 0.0086305184 | 0.007092228 | -0.0007833275 | 0.0028513212 | 0.002921873 | -0.001548813 | 0.019715919 |
0.009117836 | -0.016204986 | 0.039742771 | -0.0031877844 | 0.014906019 | -0.0102405428 | -0.0047565496 | -0.005400948 | -0.009865007 | 0.001167666 |
0.001008053 | 0.006000910 | -0.007578959 | -0.0003193095 | -0.004361106 | 0.0172690614 | -0.0028650791 | -0.003899774 | 0.003385783 | -0.001280696 |
0.005358706 | -0.008411454 | -0.012380417 | 0.0090613598 | -0.021202208 | -0.0141067872 | 0.0031828276 | 0.007391130 | 0.009059257 | 0.003912422 |
We can use the autoplot
to plot xts
or time series data with ggplot
functionality, (I only plot the first 3 assets):
autoplot(portfolio_returns[,1:3]) + xlab("Date") + ggtitle("Asset Log Returns") + theme_tq()
Technical R:
From here I will talk a little more technically but will keep it as simple as possible.
Next I compute the calculations described at the begining of the post. We first need to define a few things:
- The number of assets we have in the portfolio, denoted previously as \(N\)
- The number of days we apply our model which is usually denoted as \(T\).
N_portfolio <- ncol(portfolio_prices) days <- nrow(portfolio_prices)
(a) We can compute \(\beta\) as the following:
beta <- cov(portfolio_returns, SPY_returns) / as.numeric(var(SPY_returns))
Which returns:
adjusted | |
---|---|
AAPL | 1.2722563 |
HOG | 0.8957108 |
NVDA | 1.9681667 |
MSFT | 1.4221581 |
AMD | 1.8544079 |
F | 0.8626349 |
However, in order to understand it better we can break it down each of the computations:
The \(cov\) covariance matrix part of the \(beta\) formula looks like:adjusted | |
---|---|
AAPL | 0.0000852 |
HOG | 0.0000600 |
NVDA | 0.0001318 |
MSFT | 0.0000953 |
AMD | 0.0001242 |
F | 0.0000578 |
GE | 0.0000548 |
MMM | 0.0000714 |
INTL | 0.0000576 |
AMZN | 0.0001056 |
We can compute the covariances using base R as follows:
\[Cov(x, y) = \sum_{i=1}^N \dfrac{((x_i – \overline{x}) (y_{i} – \overline{y}))}{(n-1)}\] Where \(x_{i}\) is our asset and \(y_{i}\) is the SPY500. In R we can compute this as:
x = portfolio_returns[,1] # which takes the first Asset in the portfolio_returns data x_bar = colMeans(x) y = SPY_returns[,1] y_bar = colMeans(y) n = length(x) sum(((x - x_bar)*(y - y_bar))) / (n-1) ## [1] 0.00008521578
Which corresponds to the first result, AAPL
in the cov(portfolio_returns, SPY_returns)
calculation. We can change the \(1\) in x = portfolio_returns[,1]
to \(2\), \(3\), \(4\) … \(N\) to compute the covariances for each of the assets (HOG
, NVDA
and MSFT
, respectively), or we can wrap the above into a for loop to compute for all assets. Note: I only changed the \(1\) to an \(i\) from the above equation in the for loop, everything else is constant.
for(i in 1:ncol(portfolio_returns)){ x = portfolio_returns[,i] x_bar = colMeans(x) y = SPY_returns[,1] y_bar = colMeans(y) cov_x_y = sum(((x - x_bar)*(y - y_bar))) / (n-1) print(cov_x_y) } ## [1] 0.00008521578 ## [1] 0.00005999475 ## [1] 0.0001318279 ## [1] 0.00009525621 ## [1] 0.0001242083 ## [1] 0.00005777932 ## [1] 0.00005477266 ## [1] 0.00007136762 ## [1] 0.0000575575 ## [1] 0.0001056044
The variance of the SPY returns is.
x |
---|
0.000067 |
Which is calculated as:
\[Var = \sum(\dfrac{((y – \overline{y})^2)} {(n-1)})\] In base R we can simply calculate it as:
n = length(SPY_returns) y = SPY_returns[,1] y_bar = colMeans(y) sum(((y - y_bar)**2) / (n-1)) ## [1] 0.00006698004
Putting all this together, we can compute \(beta\). Recall, that \(\beta = \dfrac{Cov(r_{i}, y)}{Var(y)}\) where here \(r_{i}\) is each of our assets in our portfolio and \(y\) is the market returns or SPY500 returns.
To compute \(beta\) for each of our assets using base R we can wrap the above code into a function:
myBetaComputation <- function(ASSET){ x = ASSET # which Asset in the portfolio_returns data we want to compute beta for. x_bar = colMeans(x) y = SPY_returns[,1] y_bar = colMeans(y) cov_x_y <- sum(((x - x_bar)*(y - y_bar))) / (n-1) n = length(SPY_returns) y = SPY_returns[,1] y_bar = colMeans(y) var_y <- sum(((y - y_bar)**2) / (n-1)) return(cov_x_y / var_y) }
We can apply this function to individual assets in our data and then all of them:
myBetaComputation(portfolio_returns[,1]) # Computes for the first asset ## [1] 1.272256 myBetaComputation(portfolio_returns[,2]) # Computes for the second asset ## [1] 0.8957108 sapply(portfolio_returns, myBetaComputation) # Computes for all assets ## AAPL HOG NVDA MSFT AMD F GE MMM ## 1.2722563 0.8957108 1.9681667 1.4221581 1.8544079 0.8626349 0.8177459 1.0655057 ## INTL AMZN ## 0.8593232 1.5766544
The interpretation here is that a value of 1 indicates that the asset moves in perfect correlation with the market, values \(>\) 1 indicate that an asset moves more than when the market moves or moves with more volatility when the market moves and values \(<\) 1 indicates that an asset moves less than the market.
Typically Tech stocks have a higher market \(\beta\) whereas mature non-tech stocks have a lower market \(\beta\). That is, AAPL, NVDA, MSFT and AMD all have betas above 1 whereas HOG and Ford (F) have \(\beta\)’s less than 1.
(b) We can compute \(\alpha\) as the following:
Now that we have \(\beta\) we can estimate \(\alpha\) for each of our assets in our portfolio.
Recall, that alpha is defined as;
\[\alpha = \overline{r_{i}} – \beta_{i}*\overline{f}\]
Where \(\overline{r_{i}}\) is the average returns of our individual assets in our portfolio, \(\beta_{i}\) is the beta of each asset over our sample period and \(\overline{f}\) is the average returns of SPY. Therefore, we can compute \(\alpha\) as;
alpha <- colMeans(portfolio_returns) - beta*colMeans(SPY_returns)
The alpha or \(\alpha\) for our assests, looks like:
adjusted | |
---|---|
AAPL | 0.0004068 |
HOG | -0.0011516 |
NVDA | 0.0001393 |
MSFT | 0.0007480 |
AMD | 0.0005696 |
F | -0.0009272 |
There are other ways to calculate \(\beta\) and \(\alpha\) but for a single factor model we can use the CAPM.beta
and CAPM.alpha
from the PerformanceAnalytics
package.
library(PerformanceAnalytics) CAPM.beta(Ra = portfolio_returns, Rb = SPY_returns, Rf = 0) ## AAPL HOG NVDA MSFT AMD F ## Beta: adjusted 1.272256 0.8957108 1.968167 1.422158 1.854408 0.8626349 ## GE MMM INTL AMZN ## Beta: adjusted 0.8177459 1.065506 0.8593232 1.576654 CAPM.alpha(Ra = portfolio_returns, Rb = SPY_returns, Rf = 0) ## AAPL HOG NVDA MSFT AMD ## Alpha: adjusted 0.000406813 -0.001151611 0.0001393021 0.0007480208 0.0005695635 ## F GE MMM INTL ## Alpha: adjusted -0.0009272497 -0.002875334 0.00001141014 -0.0003782044 ## AMZN ## Alpha: adjusted 0.001047154
Which is much simpler than what we just did. The Ra
is the asset returns, Rb
is the market returns and Rf
is the risk-free rate of return.
Upon investigating the CAPM.beta
functions from the PerformanceAnalytics
package I noticed they had functions for CAPM.beta.bull
and CAPM.beta.bear
, so I wanted to see how these would look like plotted for each asset.
rbind( CAPM.beta.bull(Ra = portfolio_returns, Rb = SPY_returns, Rf = 0), CAPM.beta(Ra = portfolio_returns, Rb = SPY_returns, Rf = 0), CAPM.beta.bear(Ra = portfolio_returns, Rb = SPY_returns, Rf = 0) ) %>% as_tibble(rownames = NA) %>% rownames_to_column("betas") %>% pivot_longer(2:ncol(.)) %>% ggplot(aes(x = name, y = value, color = betas)) + geom_point(size = 5, shape = 19) + scale_color_brewer(type='seq', palette='Reds') + theme_tq() + ggtitle("Asset Betas:", subtitle = "Bear, Beta, Bull")
Interesting…
Visualising the covariance matrix of the returns
In order to visualise the the covariance matrix some further formulas are needed.
Ultimately we want to compute the following;
\[\hat{\Sigma} = var(y)\hat{\beta} \hat{\beta}^T + \hat{\psi}\] Where; \[\hat{\psi} = diag(\hat{\sigma}_{1}^2, … \hat{\sigma}_{i}^2, … \hat{\sigma}_{N}^2)\]
We use our \(\beta\) and \(\alpha\) results from before as well as \(N\). I created a function which takes in the \(asset\) and computes the residules and Sigma values. What we are calculating here is the following;
err
\[\hat{\epsilon_{i}} = x_{i} – \alpha_{i} – \beta_{i}*y\]
Where \(i = 1.,…, N\)
Sigma
\[\hat{\sigma}_{i}^2 = \dfrac{1} {N-2}\hat{\epsilon}_{i}^N\epsilon_{i}\]
The code in base R for the above equations is:
beta <- cov(portfolio_returns, SPY_returns) / as.numeric(var(SPY_returns)) alpha <- colMeans(portfolio_returns) - beta*colMeans(SPY_returns) N = length(SPY_returns) mySigmaFunction <- function(ASSET){ err = portfolio_returns[, ASSET] - alpha[ASSET ,] - beta[ASSET ,] * SPY_returns Sigma = (1 / (N - 2)) %*% t(err) %*% err return(Sigma) } mySigmaFunction("AAPL") # For Apple ## AAPL ## [1,] 0.0001178595 mySigmaFunction("F") # For Ford ## F ## [1,] 0.0001599711 sapply(colnames(portfolio_returns), mySigmaFunction) # For all assets in the portfolio ## AAPL HOG NVDA MSFT AMD ## 0.00011785947 0.00021461371 0.00056144410 0.00006590872 0.00126760753 ## F GE MMM INTL AMZN ## 0.00015997110 0.00035027461 0.00007165307 0.00019184188 0.00017665040
Now that we have the \(\sigma\) values. we want to create a matrix with the Sigma values down the digaonal.
Sigma2 <- sapply(colnames(portfolio_returns), mySigmaFunction) diag_Sigma2 <- diag(Sigma2) colnames(diag_Sigma2) = colnames(portfolio_returns) rownames(diag_Sigma2) = colnames(portfolio_returns)
Which creates our diagonal matrix as follows:
AAPL | HOG | NVDA | MSFT | AMD | F | GE | MMM | INTL | AMZN | |
---|---|---|---|---|---|---|---|---|---|---|
AAPL | 0.0001179 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.00000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 |
HOG | 0.0000000 | 0.0002146 | 0.0000000 | 0.0000000 | 0.0000000 | 0.00000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 |
NVDA | 0.0000000 | 0.0000000 | 0.0005614 | 0.0000000 | 0.0000000 | 0.00000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 |
MSFT | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000659 | 0.0000000 | 0.00000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 |
AMD | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0012676 | 0.00000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 |
F | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.00016 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 |
GE | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.00000 | 0.0003503 | 0.0000000 | 0.0000000 | 0.0000000 |
MMM | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.00000 | 0.0000000 | 0.0000717 | 0.0000000 | 0.0000000 |
INTL | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.00000 | 0.0000000 | 0.0000000 | 0.0001918 | 0.0000000 |
AMZN | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.00000 | 0.0000000 | 0.0000000 | 0.0000000 | 0.0001767 |
Now that we have our diagonal matrix \(Diag(\psi)\) we can compute the covariance matrix of the returns using: \[\Sigma = var(y)\beta \beta^T + \psi\]
Then use the cov2cor
function from the stats
package.
Sigma <- as.numeric(var(SPY_returns)) * beta %*% t(beta) + diag_Sigma2 corrplot(cov2cor(Sigma), main = "Covariance matrix of Portfolio Assets")
The correlation table looks like:
AAPL | HOG | NVDA | MSFT | AMD | F | GE | MMM | INTL | AMZN | |
---|---|---|---|---|---|---|---|---|---|---|
AAPL | 1.0000000 | 0.3097538 | 0.3891498 | 0.5677312 | 0.2714307 | 0.3373737 | 0.2330694 | 0.4966757 | 0.3133839 | 0.4821634 |
HOG | 0.3097538 | 1.0000000 | 0.2515805 | 0.3670311 | 0.1754765 | 0.2181079 | 0.1506764 | 0.3210946 | 0.2025988 | 0.3117126 |
NVDA | 0.3891498 | 0.2515805 | 1.0000000 | 0.4611084 | 0.2204547 | 0.2740132 | 0.1892978 | 0.4033975 | 0.2545288 | 0.3916107 |
MSFT | 0.5677312 | 0.3670311 | 0.4611084 | 1.0000000 | 0.3216216 | 0.3997582 | 0.2761668 | 0.5885171 | 0.3713324 | 0.5713213 |
AMD | 0.2714307 | 0.1754765 | 0.2204547 | 0.3216216 | 1.0000000 | 0.1911233 | 0.1320346 | 0.2813684 | 0.1775330 | 0.2731471 |
F | 0.3373737 | 0.2181079 | 0.2740132 | 0.3997582 | 0.1911233 | 1.0000000 | 0.1641118 | 0.3497257 | 0.2206640 | 0.3395071 |
GE | 0.2330694 | 0.1506764 | 0.1892978 | 0.2761668 | 0.1320346 | 0.1641118 | 1.0000000 | 0.2416026 | 0.1524423 | 0.2345432 |
MMM | 0.4966757 | 0.3210946 | 0.4033975 | 0.5885171 | 0.2813684 | 0.3497257 | 0.2416026 | 1.0000000 | 0.3248576 | 0.4998164 |
INTL | 0.3133839 | 0.2025988 | 0.2545288 | 0.3713324 | 0.1775330 | 0.2206640 | 0.1524423 | 0.3248576 | 1.0000000 | 0.3153656 |
AMZN | 0.4821634 | 0.3117126 | 0.3916107 | 0.5713213 | 0.2731471 | 0.3395071 | 0.2345432 | 0.4998164 | 0.3153656 | 1.0000000 |
ETF analysis and a randomly sampled portfolio
Analysing my randomly selected portfolio and a series of U.S. funds
Since factor modelling is all about risk and portfolio analysis I thought it would be interesting to compare the performance of a number of U.S. Exchnage Traded Funds (ETFs) and a randomly selected portfolio of assets from the SP500.
In order to construct the randomly created portfolio I first scraped the wikipedia page which contains the list of SP500 firms along with their ticker symbol, I filter out all Class A, B and C shares since a few firms have multiple asset classes and I don’t want to sample two of the same asset.
Just as before I collect the data and put it into time series format.
start_date <- "2016-01-01" end_date <- "2017-12-31" set.seed(4746) url <- "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies" symbols <- url %>% read_html() %>% html_nodes(xpath = '//*[@id="constituents"]') %>% html_table() %>% .[[1]] %>% filter(!str_detect(Security, "Class A|Class B|Class C")) %>% # Removes firms with Class A shares sample_n(20) %>% pull(Symbol) asset_prices <- tq_get( symbols, from = start_date, to = end_date ) %>% select(symbol, date, adjusted) %>% pivot_wider(names_from = symbol, values_from = adjusted) %>% tk_xts(date_var = date)
Which looks like:
MMM | MNST | TJX | TMUS | NWL | INTC | EIX | XRX | GPN | HES | TIF | IR | GRMN | DRI | GLW | ZTS | CCI | BK |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
132.8015 | 48.11333 | 33.27088 | 38.95 | 38.54331 | 30.45350 | 51.77293 | 24.64223 | 62.47044 | 44.85172 | 68.47929 | 51.15904 | 30.85206 | 55.28202 | 16.24429 | 45.97239 | 73.96610 | 36.90321 |
133.3804 | 48.69000 | 33.79228 | 40.22 | 38.52557 | 30.31014 | 51.79048 | 24.59438 | 63.43876 | 44.74931 | 68.27712 | 50.61163 | 30.96944 | 56.28635 | 16.23521 | 46.69207 | 75.48198 | 36.76471 |
130.6940 | 48.76667 | 32.92327 | 40.05 | 37.46084 | 29.63818 | 51.44798 | 24.28336 | 63.63840 | 41.95655 | 67.00899 | 49.53537 | 30.12268 | 55.86901 | 15.87241 | 46.70180 | 74.84963 | 35.84144 |
127.5101 | 48.40000 | 32.87630 | 40.51 | 35.78388 | 28.52719 | 51.19327 | 23.61348 | 61.25259 | 40.60672 | 65.17110 | 48.06017 | 29.25916 | 55.47833 | 15.50961 | 45.28187 | 72.98727 | 34.93664 |
127.0759 | 48.08333 | 31.43894 | 39.88 | 34.94097 | 28.23153 | 51.18450 | 22.94359 | 60.07465 | 40.56948 | 62.72673 | 47.60556 | 28.12736 | 55.06099 | 15.55497 | 44.62054 | 71.96515 | 34.29958 |
127.0488 | 48.68000 | 31.84760 | 39.68 | 34.91435 | 28.72431 | 51.99249 | 22.48903 | 58.41755 | 38.81003 | 62.34076 | 47.76327 | 27.83393 | 55.01660 | 15.70009 | 43.35622 | 72.49354 | 34.54886 |
Next, I collect the U.S. ETFs. I scraped a website (which I cannot remember the name) for the symbols of the ETFs.
myETFs <- c("SPY", "IVV", "VTI", "VOO", "QQQ", "VEA", "IEFA", "AGG", "VWO", "EFA","IEMG","VTV", "IJH", "IWF","BND", "IJR", "IWM", "VUG", "GLD", "IWD", "VIG", "VNQ", "USMV", "LQD", "VO", "VYM", "EEM", "VB", "VCSH", "XLF", "VCIT", "VEU", "XLK", "ITOT", "IVW", "BNDX", "VGT", "DIA", "BSV", "SHV", "IWB", "IWR", "TIP", "SCHF", "MBB", "SDY", "MDY", "SCHX", "IEF", "HYG", "DVY", "XLV", "SHY", "IXUS", "TLT", "IVE", "PFF", "IAU", "VXUS", "RSP", "SCHB", "VV", "GOVT", "EMB", "MUB", "QUAL", "XLY", "VBR", "EWJ", "XLP", "VGK", "SPLV", "MINT", "BIV", "IGSB", "EFAV", "VT", "GDX", "XLU", "IWS", "XLI", "SCHD", "IWP", "ACWI", "VMBS", "XLE", "JNK", "VOE", "FLOT", "IWV", "JPST", "SCZ", "IEI", "IWN", "DGRO", "VBK", "IGIB", "IWO") etf_prices <- tq_get( myETFs, from = start_date, to = end_date ) %>% select(symbol, date, adjusted) %>% pivot_wider(names_from = symbol, values_from = adjusted) %>% tk_xts(date_var = date)
The data looks like:
SPY | IVV | VTI | VOO | QQQ | VEA | IEFA | AGG | VWO | EFA | IEMG | VTV | IJH | IWF | BND | IJR | IWM | VUG | GLD | IWD | VIG | VNQ | USMV | LQD | VO | VYM | EEM | VB | VCSH | XLF | VCIT | VEU | XLK | ITOT | IVW | BNDX | VGT | DIA | BSV | SHV | IWB | IWR | TIP | SCHF | MBB | SDY | MDY | SCHX | IEF | HYG | DVY | XLV | SHY | IXUS | TLT | IVE | PFF | IAU | VXUS | RSP | SCHB | VV | GOVT | EMB | MUB | QUAL | XLY | VBR | EWJ | XLP | VGK | SPLV | MINT | BIV | IGSB | EFAV | VT | GDX | XLU | IWS | XLI | SCHD | IWP | ACWI | VMBS | XLE | JNK | VOE | FLOT | IWV | JPST | SCZ | IEI | IWN | DGRO | VBK | IGIB | IWO |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
186.8362 | 187.3943 | 95.69553 | 171.0293 | 105.7754 | 32.31758 | 48.12894 | 97.70714 | 28.82161 | 51.85117 | 35.29821 | 73.21329 | 129.8436 | 93.32362 | 72.46418 | 51.13569 | 104.60918 | 99.84885 | 102.89 | 88.57166 | 71.16028 | 67.15640 | 38.30936 | 100.1451 | 112.0916 | 58.70417 | 29.25530 | 103.10962 | 71.86420 | 13.61419 | 73.99710 | 38.23042 | 39.89517 | 42.61963 | 107.9444 | 48.90886 | 101.77430 | 156.9451 | 74.18727 | 105.2681 | 104.11566 | 37.17856 | 101.8868 | 24.86581 | 97.72220 | 63.99211 | 238.7614 | 44.54472 | 98.23306 | 65.49518 | 65.93277 | 66.68895 | 80.17432 | 44.14714 | 110.3894 | 80.04479 | 31.46391 | 10.38 | 39.78462 | 71.26929 | 45.05346 | 85.64318 | 23.57018 | 87.65514 | 101.0075 | 59.58526 | 72.85894 | 90.46735 | 45.07748 | 45.11800 | 43.21453 | 34.96312 | 93.40020 | 75.13937 | 48.21395 | 57.09138 | 51.90483 | 13.86359 | 38.17657 | 62.81808 | 48.48287 | 34.12284 | 87.50332 | 50.81642 | 48.00686 | 53.82844 | 80.80476 | 78.58987 | 46.91737 | 110.6951 | NA | 45.01355 | 115.1113 | 84.09489 | 23.20199 | 115.5700 | 47.86354 | 131.6090 |
187.1522 | 187.7839 | 95.90974 | 171.3355 | 105.5919 | 32.28184 | 48.04810 | 97.75242 | 28.88509 | 51.77047 | 35.38096 | 73.44967 | 129.9475 | 93.46677 | 72.55397 | 51.29261 | 104.83739 | 99.86796 | 103.18 | 88.74566 | 71.39217 | 68.45230 | 38.56908 | 100.1978 | 112.2051 | 58.99791 | 29.32071 | 103.16643 | 71.84602 | 13.66662 | 74.01466 | 38.14082 | 39.79135 | 42.69417 | 108.1532 | 48.88116 | 101.28743 | 157.0367 | 74.17794 | 105.2777 | 104.32082 | 37.26314 | 101.8220 | 24.83831 | 97.79476 | 64.25666 | 238.8375 | 44.63781 | 98.20529 | 65.61782 | 66.33069 | 67.00954 | 80.12685 | 44.10166 | 109.9440 | 80.26437 | 31.49616 | 10.40 | 39.71283 | 71.43853 | 45.09076 | 85.81048 | 23.54203 | 87.88744 | 101.1264 | 59.74447 | 72.76409 | 90.67188 | 45.64425 | 45.40763 | 42.95955 | 35.24865 | 93.40020 | 75.22964 | 48.19091 | 57.19847 | 51.91398 | 13.79471 | 38.45059 | 62.96579 | 48.61266 | 34.31977 | 87.66763 | 50.86277 | 48.09793 | 54.03376 | 80.97237 | 78.76543 | 46.90808 | 110.9662 | NA | 44.89528 | 115.1582 | 84.47743 | 23.28449 | 115.6185 | 47.92159 | 131.6866 |
184.7914 | 185.3255 | 94.62438 | 169.1641 | 104.5776 | 31.69214 | 47.20373 | 98.12343 | 28.36815 | 50.91853 | 34.74654 | 72.40415 | 128.1251 | 92.48373 | 72.87716 | 50.70773 | 103.23986 | 98.76987 | 104.67 | 87.39035 | 70.52956 | 68.26473 | 38.32790 | 100.5933 | 110.1342 | 58.22349 | 28.76008 | 101.62282 | 71.87331 | 13.45691 | 74.25190 | 37.48678 | 39.30056 | 42.08852 | 106.9669 | 49.06577 | 99.91275 | 154.7919 | 74.26179 | 105.2586 | 102.94081 | 36.64525 | 102.1459 | 24.37987 | 98.11230 | 63.51594 | 235.4678 | 44.06064 | 98.79820 | 65.61782 | 65.65863 | 66.46268 | 80.17432 | 43.33769 | 111.4255 | 79.06572 | 31.42361 | 10.57 | 39.03973 | 70.13145 | 44.53121 | 84.68593 | 23.63587 | 87.87912 | 101.5744 | 59.03268 | 72.05276 | 89.25890 | 44.85076 | 45.25377 | 42.24736 | 34.93548 | 93.40947 | 75.53665 | 48.22778 | 56.64514 | 51.14435 | 14.02101 | 38.37987 | 61.87635 | 47.86177 | 33.90801 | 86.25658 | 50.09352 | 48.20723 | 51.95383 | 80.90055 | 77.24093 | 46.90808 | 109.4701 | NA | 44.14021 | 115.4954 | 83.48843 | 22.96364 | 113.8152 | 47.99304 | 129.3380 |
180.3579 | 180.8912 | 92.30511 | 165.0441 | 101.3029 | 31.09350 | 46.30547 | 98.11440 | 27.45217 | 49.86035 | 33.69835 | 70.71315 | 124.8108 | 90.12631 | 72.88611 | 49.40485 | 100.48224 | 96.22995 | 106.15 | 85.32989 | 69.23102 | 66.93474 | 37.70643 | 100.6724 | 107.6568 | 56.96841 | 27.87242 | 98.88598 | 71.90975 | 13.07825 | 74.29585 | 36.68042 | 38.13966 | 41.07753 | 104.2529 | 48.99192 | 96.69561 | 151.1452 | 74.37366 | 105.2681 | 100.43254 | 35.80653 | 102.0626 | 23.93059 | 98.23930 | 62.23734 | 229.4516 | 43.00870 | 99.01125 | 65.19263 | 64.49137 | 65.11437 | 80.20277 | 42.43728 | 111.6255 | 77.26316 | 31.22207 | 10.72 | 38.23201 | 68.54225 | 43.44009 | 82.62268 | 23.68278 | 87.46435 | 101.7207 | 57.62780 | 70.57320 | 87.00002 | 44.17064 | 44.71072 | 41.46484 | 34.35522 | 93.40020 | 75.63598 | 48.24622 | 55.87762 | 49.95324 | 14.64089 | 38.12353 | 60.51917 | 46.56396 | 33.20084 | 84.22703 | 48.92575 | 48.27098 | 50.68622 | 80.46945 | 75.47622 | 46.91737 | 106.8519 | NA | 43.46700 | 115.7578 | 81.26778 | 22.53279 | 110.6160 | 48.06894 | 125.7083 |
178.3782 | 178.8873 | 91.25257 | 163.2903 | 100.4722 | 30.70930 | 45.71261 | 98.33157 | 27.20731 | 49.25951 | 33.37653 | 69.87674 | 123.1961 | 89.22913 | 72.97586 | 48.58223 | 98.75159 | 95.33240 | 105.68 | 84.23099 | 68.51684 | 66.03955 | 37.32612 | 100.7515 | 106.2667 | 56.32751 | 27.57342 | 97.51283 | 71.91882 | 12.87435 | 74.32222 | 36.24140 | 37.83765 | 40.59300 | 103.2470 | 48.98268 | 95.86506 | 149.5692 | 74.45753 | 105.2777 | 99.38823 | 35.36956 | 102.1459 | 23.63719 | 98.34815 | 61.63771 | 226.3864 | 42.52462 | 99.26135 | 65.02093 | 64.04923 | 64.13377 | 80.25970 | 41.90978 | 112.1254 | 76.28411 | 31.31075 | 10.66 | 37.78327 | 67.70534 | 42.94580 | 81.73975 | 23.71093 | 87.38972 | 101.7664 | 56.97219 | 69.81445 | 85.77296 | 43.22602 | 44.36678 | 41.10435 | 34.02365 | 93.39094 | 75.83461 | 48.27848 | 55.38676 | 49.36685 | 14.28668 | 38.10586 | 59.79903 | 46.09118 | 32.84278 | 83.17358 | 48.36041 | 48.30741 | 50.03457 | 80.18208 | 74.31207 | 46.88947 | 105.6269 | NA | 42.88478 | 115.9920 | 79.95219 | 22.25777 | 108.9970 | 48.12699 | 123.5344 |
178.5548 | 179.0729 | 91.19671 | 163.3088 | 100.7813 | 30.84332 | 45.87430 | 98.05104 | 27.08941 | 49.46577 | 33.33975 | 69.83128 | 122.7901 | 89.27684 | 72.67066 | 48.58699 | 98.32368 | 95.42790 | 104.74 | 84.19437 | 68.66522 | 66.41467 | 37.38177 | 100.4263 | 105.7561 | 56.38982 | 27.56408 | 97.03933 | 71.98254 | 12.89183 | 74.19040 | 36.31308 | 38.08304 | 40.58370 | 103.5317 | 48.92731 | 96.33284 | 150.1098 | 74.44821 | 105.2681 | 99.32295 | 35.24034 | 101.6092 | 23.71054 | 98.25743 | 61.62007 | 225.7010 | 42.51531 | 98.94638 | 64.92279 | 64.19955 | 63.37006 | 80.28816 | 41.99163 | 110.8984 | 76.18345 | 31.14146 | 10.57 | 37.86405 | 67.48906 | 42.91782 | 81.75836 | 23.67340 | 87.24041 | 101.6019 | 57.05649 | 70.36455 | 85.55912 | 43.52830 | 44.78313 | 41.22744 | 34.09733 | 93.41876 | 75.65400 | 48.26467 | 55.53847 | 49.39434 | 13.69632 | 38.30916 | 59.57745 | 46.11899 | 32.90544 | 82.78702 | 48.41602 | 48.23457 | 48.96336 | 80.06235 | 74.02564 | 46.90808 | 105.5428 | NA | 42.95756 | 115.8515 | 79.92420 | 22.32194 | 108.0954 | 48.04214 | 122.4960 |
I also collect the SPY500.
SPY_prices <- tq_get( "^GSPC", from = start_date, to = end_date ) %>% select(date, adjusted) %>% tk_xts(date_var = date)
Which looks like:
adjusted |
---|
2012.66 |
2016.71 |
1990.26 |
1943.09 |
1922.03 |
1923.67 |
Now, we have a series of 3 data sets, the daily prices of the adjusted close for the randomly selected assets from the SPY500 wikipedia page, the ETFs and the SPY500. Next I calculate the daily returns.
asset_returns <- diff(log(asset_prices), na.pad = FALSE) etf_returns <- diff(log(etf_prices), na.pad = FALSE) SPY_returns <- diff(log(SPY_prices), na.pad = FALSE)
The asset_returns
looks like:
MMM | MNST | TJX | TMUS | NWL | INTC | EIX | XRX | GPN | HES | TIF | IR | GRMN | DRI | GLW | ZTS | CCI | BK |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.0043496940 | 0.011914276 | 0.015549833 | 0.032085611 | -0.0004605231 | -0.004718686 | 0.0003390193 | -0.001943595 | 0.015381467 | -0.0022860685 | -0.002956558 | -0.010757820 | 0.003797292 | 0.0180042454 | -0.0005589373 | 0.0155331932 | 0.020287018 | -0.003760040 |
-0.0203468502 | 0.001573356 | -0.026052579 | -0.004235761 | -0.0280260856 | -0.022418948 | -0.0066352645 | -0.012726538 | 0.003142188 | -0.0644414487 | -0.018747883 | -0.021494331 | -0.027722605 | -0.0074420504 | -0.0225996450 | 0.0002084719 | -0.008412745 | -0.025433678 |
-0.0246630740 | -0.007547151 | -0.001427791 | 0.011420159 | -0.0457985304 | -0.038205692 | -0.0049630049 | -0.027973817 | -0.038210941 | -0.0327010158 | -0.027810684 | -0.030233211 | -0.029085489 | -0.0070174571 | -0.0231226086 | -0.0308759534 | -0.025196097 | -0.025568744 |
-0.0034106468 | -0.006564266 | -0.044704812 | -0.015673838 | -0.0238374478 | -0.010418300 | -0.0001713849 | -0.028779059 | -0.019418177 | -0.0009174858 | -0.038228556 | -0.009504269 | -0.039450060 | -0.0075510143 | 0.0029199836 | -0.0147124623 | -0.014103137 | -0.018403085 |
-0.0002136353 | 0.012332679 | 0.012914805 | -0.005027688 | -0.0007621181 | 0.017304330 | 0.0156624749 | -0.020011095 | -0.027971612 | -0.0443372159 | -0.006172111 | 0.003307394 | -0.010486917 | -0.0008065403 | 0.0092862449 | -0.0287440451 | 0.007315467 | 0.007241532 |
0.0028436862 | 0.008590726 | 0.016384277 | 0.009531073 | 0.0156331210 | 0.019154091 | 0.0033726404 | 0.013734922 | 0.000000000 | -0.0514319659 | 0.008367407 | 0.016185405 | -0.001507319 | 0.0088375907 | 0.0017316113 | 0.0087102953 | 0.004411381 | 0.007985230 |
However, I want to assume that I own all of these assets in a single portfolio. So I average over the rows and join the data to the ETFs and call it all_returns
.
portfolio_returns <- xts(rowMeans(asset_returns), index(asset_returns)) colnames(portfolio_returns) <- c("myPortfolio") all_returns <- cbind(portfolio_returns, etf_returns)
The all_returns
data looks like the following, where we can now see myPortfolio
has been joined with the ETF dataset.
myPortfolio | SPY | IVV | VTI | VOO | QQQ | VEA | IEFA | AGG | VWO | EFA | IEMG | VTV | IJH | IWF | BND | IJR | IWM | VUG | GLD | IWD | VIG | VNQ | USMV | LQD | VO | VYM | EEM | VB | VCSH | XLF | VCIT | VEU | XLK | ITOT | IVW | BNDX | VGT | DIA | BSV | SHV | IWB | IWR | TIP | SCHF | MBB | SDY | MDY | SCHX | IEF | HYG | DVY | XLV | SHY | IXUS | TLT | IVE | PFF | IAU | VXUS | RSP | SCHB | VV | GOVT | EMB | MUB | QUAL | XLY | VBR | EWJ | XLP | VGK | SPLV | MINT | BIV | IGSB | EFAV | VT | GDX | XLU | IWS | XLI | SCHD | IWP | ACWI | VMBS | XLE | JNK | VOE | FLOT | IWV | JPST | SCZ | IEI | IWN | DGRO | VBK | IGIB | IWO |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.006099968 | 0.0016898660 | 0.002077056 | 0.0022360355 | 0.0017890863 | -0.001736693 | -0.001106388 | -0.001681088 | 0.00046332859 | 0.002200231 | -0.001557725 | 0.0023415111 | 0.0032234753 | 0.0007997348 | 0.0015327343 | 0.0012383279 | 0.003064097 | 0.002179106 | 0.000191391 | 0.002814589 | 0.0019626291 | 0.003253374 | 0.019113014 | 0.006756640 | 0.0005267369 | 0.001011874 | 0.004991256 | 0.0022334070 | 0.0005508056 | -0.0002531205 | 0.003843879 | 0.0002372109 | -0.002346225 | -0.002605712 | 0.0017475020 | 0.001932397 | -0.0005666427 | -0.004795300 | 0.0005836581 | -0.0001257842 | 0.00009081159 | 0.0019685044 | 0.002272383 | -0.0006361631 | -0.001106790 | 0.0007422784 | 0.0041255033 | 0.0003189161 | 0.0020875396 | -0.0002827452 | 0.001870799 | 0.0060170694 | 0.004795681 | -0.0005922602 | -0.001030700 | -0.004042725 | 0.002739571 | 0.0010243955 | 0.001924928 | -0.001806146 | 0.002371882 | 0.0008275628 | 0.0019515134 | -0.001194977 | 0.00264674400 | 0.0011761117 | 0.002668490 | -0.001302665 | 0.002258164 | 0.012494986 | 0.006398807 | -0.005917852 | 0.008133639 | 0.00000000000 | 0.001200660 | -0.0004779842 | 0.001874008 | 0.0001763650 | -0.00498036 | 0.0071519848 | 0.0023486170 | 0.0026733893 | 0.005754619 | 0.001875951 | 0.0009116517 | 0.0018953483 | 0.003807047 | 0.0020722096 | 0.002231295 | -0.0001980912 | 0.002446725 | NA | -0.002630911 | 0.0004069319 | 0.0045386533 | 0.003549552 | 0.0004194414 | 0.001211963 | 0.0005894971 |
-0.017322911 | -0.0126945384 | -0.013178202 | -0.0134924019 | -0.0127544774 | -0.009652265 | -0.018436329 | -0.017729720 | 0.00378821023 | -0.018058472 | -0.016592869 | -0.0180938802 | -0.0143367503 | -0.0141237737 | -0.0105732344 | 0.0044445716 | -0.011468264 | -0.015355426 | -0.011056244 | 0.014337489 | -0.0153896498 | -0.012156174 | -0.002743903 | -0.006272724 | 0.0039388941 | -0.018628687 | -0.013212984 | -0.0193056059 | -0.0150753935 | 0.0003798793 | -0.015464153 | 0.0032002389 | -0.017296894 | -0.012410734 | -0.0142874994 | -0.011028538 | 0.0037696378 | -0.013665012 | -0.0143976824 | 0.0011297112 | -0.00018126091 | -0.0133167477 | -0.016720820 | 0.0031757481 | -0.018629431 | 0.0032416931 | -0.0115945095 | -0.0142095765 | -0.0130144362 | 0.0060192314 | 0.000000000 | -0.0101836091 | -0.008194411 | 0.0005922602 | -0.017474795 | 0.013385186 | -0.015046456 | -0.0023059217 | 0.016213994 | -0.017094464 | -0.018466064 | -0.0124870774 | -0.0131916717 | 0.003978224 | -0.00009471654 | 0.0044202968 | -0.011985546 | -0.009823880 | -0.015706040 | -0.017537150 | -0.003394215 | -0.016716991 | -0.008924441 | 0.00009924540 | 0.004072733 | 0.0007647688 | -0.009721009 | -0.0149360906 | 0.01627195 | -0.0018408588 | -0.0174535242 | -0.0155668847 | -0.012070245 | -0.016226286 | -0.0152395434 | 0.0022698272 | -0.039253655 | -0.0008873999 | -0.019544672 | 0.0000000000 | -0.013574412 | NA | -0.016961509 | 0.0029245001 | -0.0117764103 | -0.013875422 | -0.0157191560 | 0.001489930 | -0.0179955058 |
-0.022721172 | -0.0242840940 | -0.024218220 | -0.0248157103 | -0.0246568769 | -0.031814151 | -0.019069875 | -0.019212776 | -0.00009198023 | -0.032821816 | -0.021000906 | -0.0306311536 | -0.0236320930 | -0.0262082119 | -0.0258207050 | 0.0001227881 | -0.026029826 | -0.027074043 | -0.026052010 | 0.014040682 | -0.0238600026 | -0.018582970 | -0.019675179 | -0.016347691 | 0.0007861946 | -0.022751878 | -0.021792064 | -0.0313506787 | -0.0273006374 | 0.0005068469 | -0.028542194 | 0.0005917693 | -0.021745295 | -0.029984083 | -0.0243136677 | -0.025700013 | -0.0015062361 | -0.032729302 | -0.0238405527 | 0.0015052940 | 0.00009044932 | -0.0246679043 | -0.023153370 | -0.0008155098 | -0.018599946 | 0.0012935879 | -0.0203358057 | -0.0258820165 | -0.0241643775 | 0.0021541043 | -0.006500818 | -0.0179376149 | -0.020495249 | 0.0003548886 | -0.020995449 | 0.001793079 | -0.023062135 | -0.0064343040 | 0.014091356 | -0.020906723 | -0.022920923 | -0.0248076812 | -0.0246652173 | 0.001982686 | -0.00473089487 | 0.0014396023 | -0.024085978 | -0.020748172 | -0.025632797 | -0.015280173 | -0.012072667 | -0.018696026 | -0.016748990 | -0.00009924540 | 0.001314087 | 0.0003822792 | -0.013642270 | -0.0235646381 | 0.04326116 | -0.0067016093 | -0.0221778810 | -0.0274902986 | -0.021075943 | -0.023810522 | -0.0235878969 | 0.0013215006 | -0.024701342 | -0.0053430008 | -0.023111858 | 0.0001980912 | -0.024207994 | NA | -0.015369037 | 0.0022689596 | -0.0269583342 | -0.018940649 | -0.0285117041 | 0.001580334 | -0.0284649716 |
-0.016273779 | -0.0110371371 | -0.011139358 | -0.0114683030 | -0.0106831096 | -0.008234532 | -0.012433352 | -0.012885863 | 0.00221091926 | -0.008959566 | -0.012123531 | -0.0095957668 | -0.0118987200 | -0.0130211706 | -0.0100044748 | 0.0012306568 | -0.016790854 | -0.017373511 | -0.009370855 | -0.004437547 | -0.0129618969 | -0.010369474 | -0.013464265 | -0.010137208 | 0.0007854976 | -0.012995981 | -0.011313767 | -0.0107854401 | -0.0139835513 | 0.0001261363 | -0.015713048 | 0.0003548290 | -0.012040822 | -0.007949941 | -0.0118655695 | -0.009695182 | -0.0001885182 | -0.008626436 | -0.0104818256 | 0.0011270489 | 0.00009081159 | -0.0104525822 | -0.012278941 | 0.0008155098 | -0.012336194 | 0.0011074564 | -0.0096812509 | -0.0134486432 | -0.0113193358 | 0.0025227807 | -0.002637331 | -0.0068795036 | -0.015174172 | 0.0007095865 | -0.012508199 | 0.004468584 | -0.012752597 | 0.0028360806 | -0.005612737 | -0.011806658 | -0.012285245 | -0.0114437766 | -0.0107437626 | 0.001188048 | -0.00085371766 | 0.0004493159 | -0.011441888 | -0.010809474 | -0.014204613 | -0.021617824 | -0.007722146 | -0.008731859 | -0.009698102 | -0.00009916959 | 0.002622702 | 0.0006684507 | -0.008823294 | -0.0118083221 | -0.02449108 | -0.0004634958 | -0.0119706508 | -0.0102051133 | -0.010843211 | -0.012586070 | -0.0116223810 | 0.0007544131 | -0.012939813 | -0.0035775482 | -0.015544198 | -0.0005950313 | -0.011530313 | NA | -0.013485184 | 0.0020213357 | -0.0163207962 | -0.012280109 | -0.0147442744 | 0.001206870 | -0.0174444534 |
-0.003287114 | 0.0009894013 | 0.001036607 | -0.0006123565 | 0.0001135276 | 0.003072098 | 0.004354785 | 0.003530857 | -0.00285692501 | -0.004342847 | 0.004178409 | -0.0011027594 | -0.0006508431 | -0.0033014245 | 0.0005345031 | -0.0041909610 | 0.000097994 | -0.004342530 | 0.001001152 | -0.008934590 | -0.0004349583 | 0.002163316 | 0.005664191 | 0.001489776 | -0.0032326845 | -0.004816595 | 0.001105438 | -0.0003388982 | -0.0048675476 | 0.0008856206 | 0.001356817 | -0.0017752166 | 0.001975840 | 0.006464163 | -0.0002292283 | 0.002753215 | -0.0011312023 | 0.004867711 | 0.0036078899 | -0.0001251127 | -0.00009081159 | -0.0006570038 | -0.003659859 | -0.0052679258 | 0.003098229 | -0.0009228731 | -0.0002861645 | -0.0030322035 | -0.0002189089 | -0.0031781230 | -0.001510439 | 0.0023442262 | -0.011979547 | 0.0003544986 | 0.001951243 | -0.011003744 | -0.001320438 | -0.0054213750 | -0.008478619 | 0.002135701 | -0.003199632 | -0.0006516613 | 0.0002276234 | -0.001584279 | -0.00170998028 | -0.0016183255 | 0.001478523 | 0.007848505 | -0.002496171 | 0.006968673 | 0.009340380 | 0.002990074 | 0.002163358 | 0.00029781110 | -0.002384484 | -0.0002860690 | 0.002735430 | 0.0005566762 | -0.04220025 | 0.0053208513 | -0.0037123602 | 0.0006030788 | 0.001905938 | -0.004658536 | 0.0011492677 | -0.0015090848 | -0.021642045 | -0.0014944173 | -0.003861882 | 0.0003969400 | -0.000796942 | NA | 0.001695691 | -0.0012123094 | -0.0003501705 | 0.002878845 | -0.0083062241 | -0.001764683 | -0.0084416475 |
0.005140787 | 0.0080360599 | 0.007689359 | 0.0068197139 | 0.0082617000 | 0.011531038 | 0.004047425 | 0.004688442 | 0.00221249905 | 0.003675782 | 0.004702448 | 0.0008273783 | 0.0061003752 | 0.0038377668 | 0.0097876198 | 0.0025911534 | 0.002638930 | 0.002993541 | 0.009163331 | -0.005072985 | 0.0057480147 | 0.008340049 | -0.006697424 | 0.005197513 | 0.0022727943 | 0.004549844 | 0.005352840 | 0.0020317774 | 0.0028262158 | 0.0000000000 | 0.007652486 | 0.0005922597 | 0.004185365 | 0.011825830 | 0.0070922542 | 0.010122713 | 0.0013197206 | 0.011331708 | 0.0068127609 | 0.0001251127 | 0.00009081159 | 0.0077616649 | 0.004124727 | 0.0027283407 | 0.004244698 | 0.0015682489 | 0.0052809133 | 0.0041247242 | 0.0078515507 | 0.0044839921 | 0.001007182 | 0.0009635456 | 0.012714693 | 0.0001185908 | 0.003891078 | 0.014321948 | 0.006106584 | -0.0007773691 | -0.004741593 | 0.002367465 | 0.004726248 | 0.0064978364 | 0.0075872456 | 0.003165548 | -0.00467027771 | 0.0003596236 | 0.010125999 | 0.010992048 | 0.001411921 | -0.005221970 | 0.005441886 | 0.008282949 | 0.005119095 | -0.00019864151 | 0.001431193 | -0.0003823817 | 0.003208727 | 0.0055494219 | -0.02252194 | -0.0039301169 | 0.0004647994 | 0.0080082458 | 0.005967129 | 0.008138341 | 0.0064871928 | 0.0001887472 | 0.002367193 | 0.0023899644 | 0.002866649 | 0.0001980912 | 0.007238678 | NA | 0.000000000 | 0.0018584037 | -0.0026886552 | 0.008587052 | 0.0051881359 | 0.001579281 | 0.0088344163 |
Next, I compute (just as before) the \(\beta\) and \(\alpha\) of the portfolios. This time using just the CAPM.beta
and CAPM.alpha
functions in the PerformanceAnalytics
package.
library(PerformanceAnalytics) betas <- CAPM.beta(Ra = all_returns, Rb = SPY_returns, Rf = 0) alphas <- CAPM.alpha(Ra = all_returns, Rb = SPY_returns, Rf = 0) beta_alphas <- cbind(t(betas), t(alphas)) colnames(beta_alphas) <- c("beta", "alpha")
I rank the ETF’s by their \(\alpha\) value. We can see the GDX
has the highest \(\alpha\) and a negative \(\beta\) which makes sense since its a Goldminers ETF whihc tracks the Arca Gold Miners Index (GDMNTR) which tracks the performance of companies involved in the gold mining industry. As far as I know there are very few Gold mining companies listed in the SPY500. However, this is a nice way to present and rank ETFs by their \(\alpha\) values and see their corresponding \(\beta\) values.
beta_alphas %>% as_tibble(rownames = NA) %>% rownames_to_column("ETFs") %>% arrange(desc(alpha)) %>% head(10) %>% # I only take the top 10 observations since there are many ETF's in the data pivot_longer(cols = c("beta", "alpha")) %>% ggplot(aes(x = reorder(ETFs, if_else(name == "alpha", value, -Inf), FUN = max), y = value)) + geom_bar(color = "black", fill = "darkred", stat = "identity") + scale_color_brewer(type='seq', palette='Reds') + facet_wrap(~name, scales = "free") + theme_tq() + coord_flip() + ggtitle("Highest ranked alpha stocks", subtitle = "With corresponding beta values") + ylab("Alpha / Beta value") + xlab("ETF")
We can also rank the \(alpha\) and \(beta\) values by taking their ratio of \(\dfrac{\alpha}{\beta}\) and plot the results.
beta_alphas %>% as_tibble(rownames = NA) %>% rownames_to_column("ETFs") %>% mutate(alpha_divided_beta = alpha/beta) %>% arrange(desc(alpha_divided_beta )) %>% head(10) %>% ggplot(aes(x = ETFs, y = alpha_divided_beta )) + geom_bar(color = "black", fill = "darkred", stat = "identity") + scale_color_brewer(type='seq', palette='Reds') + theme_tq() + coord_flip() + ggtitle("Best Performers: Alpha over Beta ranked ETFs") + ylab("Alpha / Beta value") + xlab("ETF")
Changing a few lines of code and we can plot and see the worst performers with negative \(\alpha\) to \(\beta\) ratios.
beta_alphas %>% as_tibble(rownames = NA) %>% rownames_to_column("ETFs") %>% mutate(alpha_divided_beta = alpha/beta) %>% arrange(alpha_divided_beta) %>% head(10) %>% ggplot(aes(x = ETFs, y = alpha_divided_beta )) + geom_bar(color = "black", fill = "darkblue", stat = "identity") + scale_color_brewer(type='seq', palette='Reds') + theme_tq() + coord_flip() + ggtitle("Worst Performers: Alpha over Beta ranked ETFs") + ylab("Alpha / Beta value") + xlab("ETF")
We can take our ETF \(\beta\) and \(\alpha\) values before and compute \(\sigma\).
beta <- cov(all_returns, SPY_returns) / as.numeric(var(SPY_returns)) alpha <- colMeans(all_returns) - beta*colMeans(SPY_returns) N = length(SPY_returns) mySigmaFunction <- function(ASSET){ err = all_returns[, ASSET] - alpha[ASSET ,] - beta[ASSET ,] * SPY_returns Sigma = (1 / (N - 2)) %*% t(err) %*% err return(Sigma) } Sigma2 <- sapply(colnames(all_returns), mySigmaFunction) diag_Sigma2 <- diag(Sigma2) colnames(diag_Sigma2) = colnames(all_returns) rownames(diag_Sigma2) = colnames(all_returns) Sigma <- as.numeric(var(SPY_returns)) * beta %*% t(beta) + diag_Sigma2
Finally, I take a random sample of the ETF’s (since there are too many to analyse) and plot the correlations between the ETF’s.
set.seed(1234) Sigma_subsample <- sample(colnames(Sigma), size = 10) Sigma_sampled <- Sigma[ncol = Sigma_subsample, nrow = Sigma_subsample, drop = FALSE] col <- colorRampPalette(c("#BB4444", "#EE9988", "#FFFFFF", "#77AADD", "#4477AA")) corrplot(cov2cor(Sigma_sampled), method = "color", col = col(200), main = "Covariance matrix of ETFs", type = "upper", order = "hclust", number.cex = .7, addCoef.col = "black", tl.col = "black", tl.srt = 90, diag = FALSE)
Sharpe Ratio, CAPM and Fama-French Factor Analysis
Using simple plots still doesn’t give us enough information about an ETF, Portfolio or Asset. The Sharpe ratio Sharpe (1966) is a better measure. The Sharpe ratio is a reward-to-variability ratio which allows us to compare the performance of a portfolio compared to a risk-free asset, after adjusting for it’s risk. It takes the differenece between the portfolio’s return and the risk-free return, this is then divided by the standard deviation (which measures the portfolios volatility).
The Sharpe ratio tells us what the additional unit of return we can expect per unit of risk increase. The Sharpe ratio is defined as:
\[S = \dfrac{\overline{d}}{\sigma_{d}}\]
Where \[\widetilde{d} = \widetilde{R}_{F} – \widetilde{R}_{B}\]
The single asset model with just the market factor looks like:
\[ x_t = \alpha + \beta y_{t} + \epsilon_{t}\]
Taking the expected value of \(x\) and \(y\) at time \(t\) we have:
\[E[x_{t}] = \alpha + \beta E[y_{t}]\] With variance:
\[var[x_{t}] = \beta^2var[y_t] + \sigma^2\]
Thus, the Sharpe ratio becomes:
\[\dfrac{E[x_{t}]}{\sqrt{var[x_{t}]}}\]
Recall that \(\sigma = \sqrt{var[x_t]}\), we just plug in the values for \(E[x_{t}]\) and \(var[x_t]\) defined previously and the equation becomes:
\[\dfrac{\alpha + \beta E[y_{t}]}{\sqrt{\beta^2 var[y_t] + \sigma^2}}\] Which after some algebra we obtain:
\[\dfrac{\dfrac{\alpha}{\beta} + E[y_{t}]}{\sqrt{var[y_{t}] + \dfrac{\sigma^{2}}{\beta^{2}}}}\]
Finally we end up with:
\[\dfrac{\dfrac{\alpha}{\beta} + E[y_t]}{\sqrt{(var[y_t])}}\]
We can represent the Sharpe ratio as \(\dfrac{\overline{x}}{\sqrt(Diag(\sigma^{2}))}\) where \(\overline{x}\) is the average value of \(x\) over the historic period from \(t=1\) to \(T\) – simply computed as \(\overline{x} = \dfrac{1}{T}\sum_{t=1}^{T} D_{t}\). In R we can simply take colMeans(all_returns) / sqrt(diag(var(all_returns)))
.
Machine Learning and Clustering
We can cluster our ETF’s based on their \(\beta\), \(\alpha\) and \(sharpe\) values. Why would we want to do this? Perhaps with more variables than the 3 here the model can cluster these firms in a higher dimensional space and thus we can select our ETF’s based on the clusters and use it as a portfolio diversification tool, that is one cluster might contain riskier ETF’s whereas another might contain value
stocks or growth
stocks. With such few variables here it doesn’t quite work but with more variables we can better classifiy ETFs.
betas <- CAPM.beta(Ra = all_returns, Rb = SPY_returns, Rf = 0) alphas <- CAPM.alpha(Ra = all_returns, Rb = SPY_returns, Rf = 0) sharpe <- colMeans(all_returns) / sqrt(diag(var(all_returns))) beta_alphas_sharpe <- cbind(t(betas), t(alphas), sharpe) colnames(beta_alphas_sharpe) <- c("beta", "alpha", "sharpe") beta_alphas_sharpe <- na.omit(beta_alphas_sharpe) library(factoextra) kmeans_model <- kmeans(beta_alphas_sharpe, centers = 5, iter.max = 10, nstart = 1) fviz_cluster(kmeans_model, data = beta_alphas_sharpe)
The beta_alphas_sharpe
data looks like:
beta | alpha | sharpe | |
---|---|---|---|
myPortfolio | 1.0977199 | 0.0000345 | 0.0834437 |
SPY | 0.9990763 | 0.0000799 | 0.0988232 |
IVV | 0.9923880 | 0.0000872 | 0.1000217 |
VTI | 1.0282825 | 0.0000718 | 0.0969068 |
VOO | 0.9978598 | 0.0000848 | 0.0995858 |
QQQ | 1.1472461 | 0.0000931 | 0.0870931 |
We can plot the \(\beta\), \(\alpha\), and \(sharpe\) values in a 3D plot and colour them according to the clusters from the kmeans
model. It also gives me the opportunity to use the threejs
package which we can interact with.
#library(rgl) #options(rgl.printRglwidget = TRUE) #plot3d(beta_alphas_sharpe, col = kmeans_model$cluster, size=2, type='s') library(threejs) scatterplot3js(beta_alphas_sharpe, color = factor(kmeans_model$cluster), size = 0.5, bg = "black")
Factor Modelling Fama and French
Finally, I analyse the performance of the various ETF’s. The CAPM formula tries to explain the performance of a portfolio through a sigle factor, the market as a whole. CAPM is defined as follows:
\[R_{a} = R_{rf} + \beta_{a}(R_{m} – R_{rf})\]
We can take the model further by adding Factors to the model. Thus, the 3 Factor model looks like:
\[R_{a} = R_{rf} + \beta_{a}(R_{m} – R_{rf}) + \beta_{b}(SMB) + \beta_{c} HML\]
Which takes the market factor from the CAPM and adds 2 new factors, the \(SMB\) and \(HML\) or Small-minus-Big and High-minus-Low. The \(SMB\) tries to caputre the size effect, small market cap companies should outperform the market over the long run. The \(HML\) aims to capture the value verses growth effect by sorting assests based on high book-to-market ratios (defined at the begining of the post). Firms which typically have a high book-to-market ratio are value stocks and low book-to-market ratios are growth stocks, the literature also shows that value stocks outperform growth stocks over the long term horizon.
The Fama French factor model allows us to analyse fund managers performance, since, if an asset managers portfolio can be explained by the Fama French factors then that fund manager has not added any value through portfolio selection and skill and subseuently obtained no alpha.
More formally, our equation becomes:
\[X^T = a^T + B F^T + E^T\]
Where \(F\) is a matrix of factors and \(B\) is a scalar of \(\beta\)’s such that \(B = [{b_{1}, b_{2}, b_{3}]}\), in which \(b_{1}\) is the market \(\beta\), \(b_{2}\) is the \(SMB\) \(\beta\) and \(b_{3}\) is the \(HML\) \(\beta\). More formally extending to include more factors up to \(N\) we have; \(B = [\beta_{1}, … , \beta_{N}]^T\)
We want to fit a line which minimises the squared errors such that; \(minimise, \beta\) \(|X^T – a^T-BF^T|_{F}^2\)
The solution becomes \([a, B] = X^T \widetilde{F}(\widetilde{F}^T\widetilde{F})^{-1}\) which we can solve in R using the following:
(For completness I re-download the data)
- I download the data as before randomly selecting from Wikipedia and convert the daily prices to daily returns – (I set a
seed
such that we collect the same data usingset.seed
).
library(quantmod) library(tidyquant) library(timetk) library(rvest) start_date <- "2016-01-01" end_date <- "2019-11-15" set.seed(4746) url <- "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies" symbols <- url %>% read_html() %>% html_nodes(xpath = '//*[@id="constituents"]') %>% html_table() %>% .[[1]] %>% filter(!str_detect(Security, "Class A|Class B|Class C")) %>% # Removes firms with Class A shares sample_n(20) %>% pull(Symbol) asset_returns <- tq_get( symbols, from = start_date, to = end_date ) %>% group_by(symbol) %>% tq_transmute( select = adjusted, mutate_fun = periodReturn, period = "daily", type = "arithmetic" ) %>% select(symbol, date, daily.returns) %>% pivot_wider(names_from = symbol, values_from = daily.returns) %>% .[-1, ]
- I download the ETFs just as before and convert to daily returns.
myETFs <- c("SPY", "IVV", "VTI", "VOO", "QQQ", "VEA", "IEFA", "AGG", "VWO", "EFA","IEMG","VTV", "IJH", "IWF","BND", "IJR", "IWM", "VUG", "GLD", "IWD", "VIG", "VNQ", "USMV", "LQD", "VO", "VYM", "EEM", "VB", "VCSH", "XLF", "VCIT", "VEU", "XLK", "ITOT", "IVW", "BNDX", "VGT", "DIA", "BSV", "SHV", "IWB", "IWR", "TIP", "SCHF", "MBB", "SDY", "MDY", "SCHX", "IEF", "HYG", "DVY", "XLV", "SHY", "IXUS", "TLT", "IVE", "PFF", "IAU", "VXUS", "RSP", "SCHB", "VV", "GOVT", "EMB", "MUB", "QUAL", "XLY", "VBR", "EWJ", "XLP", "VGK", "SPLV", "MINT", "BIV", "IGSB", "EFAV", "VT", "GDX", "XLU", "IWS", "XLI", "SCHD", "IWP", "ACWI", "VMBS", "XLE", "JNK", "VOE", "FLOT", "IWV", "JPST", "SCZ", "IEI", "IWN", "DGRO", "VBK", "IGIB", "IWO") etf_returns <- tq_get( myETFs, from = start_date, to = end_date ) %>% group_by(symbol) %>% tq_transmute( select = adjusted, mutate_fun = periodReturn, period = "daily", type = "arithmetic" ) %>% select(symbol, date, daily.returns) %>% pivot_wider(names_from = symbol, values_from = daily.returns) %>% .[-1, ]
- I take the average daily return of the randomly selected stocks and join the data with the ETFs and set the data as a time series object. I also download the daily Fama French 3 factors from Kenneth French’s website and clean the data up a little.
portfolio_and_etfs <- asset_returns %>% mutate(myPortfolio = rowMeans(select(., -date), na.rm = TRUE)) %>% select(date, myPortfolio) %>% inner_join(etf_returns, by = c("date")) %>% tk_xts(date_var = date) # Daily fama french library(glue) library(readr) temp <- tempfile() base <- "http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/" factor <- "F-F_Research_Data_Factors_daily" format<-"_CSV.zip" full_url <-glue(base,factor,format,sep ="") download.file(full_url,temp,quiet = TRUE) North_America_3_Factors <- read_csv(unz(temp, "F-F_Research_Data_Factors_daily.CSV"), skip = 4) %>% rename(date = X1) %>% mutate(date = ymd(date)) %>% drop_na(date) %>% rename(Mkt_Rf_3 = `Mkt-RF`, SMB_3 = SMB, HML_3 = HML, RF_3 = RF) %>% select(-RF_3) %>% mutate_at(vars(c("Mkt_Rf_3", "SMB_3", "HML_3")), funs(./100)) %>% tk_xts(date_var = date)
Finally we can compute the solution \([a, B] = X^T \widetilde{F}(\widetilde{F}^T\widetilde{F})^{-1}\)
three_factors <- North_America_3_Factors[index(North_America_3_Factors) %in% c(min(index(portfolio_and_etfs)):max(index(North_America_3_Factors)))] # this ensures that the dates from the both time series match portfolio_and_etfs_new <- portfolio_and_etfs[index(portfolio_and_etfs) %in% c(min(index(three_factors))):max(index(three_factors))] # There is probably a much nicer way to go about doing this but this works... FF_3Factors <- cbind(alpha = 1, three_factors) Gamma <- t(solve(t(FF_3Factors) %*% FF_3Factors, t(FF_3Factors) %*% portfolio_and_etfs_new))
Which we can see as:
alpha | Mkt_Rf_3 | SMB_3 | HML_3 | |
---|---|---|---|---|
myPortfolio | 0.0001697 | 0.9844870 | 0.0155642 | 0.0629606 |
SPY | 0.0000391 | 0.9784199 | -0.1331473 | -0.0085326 |
IVV | 0.0000423 | 0.9771912 | -0.1341942 | -0.0113980 |
VTI | 0.0000440 | 0.9727841 | -0.0164012 | -0.0019078 |
VOO | 0.0000430 | 0.9769683 | -0.1356460 | -0.0099922 |
QQQ | 0.0000440 | 1.1645777 | -0.1440829 | -0.4955734 |
Instead, we can estimate these using linear regression models. For my random portfolio myPortfolio
we can use the lm
function to fit a linear model and then use the tidy
function from the broom
package to tidy the output up a little:
library(broom) regdata <- cbind(FF_3Factors, portfolio_and_etfs_new) #sapply(regdata, function(x) sum(is.na(x))) # One column was causing problems since it consisted of just NA values. I remove it below. regdata$JPST <- NULL # This ETF failed to download or does not exist on the Yahoo Finance site myPortfolioReg <- lm(myPortfolio ~ Mkt_Rf_3 + SMB_3 + HML_3, data = regdata, na.action = na.exclude) tidy(myPortfolioReg) ## # A tibble: 4 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <dbl> ## 1 (Intercept) 0.000170 0.000112 1.51 0.131 ## 2 Mkt_Rf_3 0.984 0.0136 72.6 0 ## 3 SMB_3 0.0156 0.0226 0.690 0.490 ## 4 HML_3 0.0630 0.0200 3.15 0.00167
We can apply this to all our ETFs in the data using the apply
command and applying our own custom lm
function.
regressionLists <- apply(regdata, 2, function(y) lm(y ~ Mkt_Rf_3 + SMB_3 + HML_3, data = regdata, na.action = na.exclude))
We can also apply the tidy
command to individual ETFs and then use the stars.pval
to make the data even more tidy awesome.
library(gtools) tidy(regressionLists$myPortfolio) %>% mutate(p.value = stars.pval(p.value)) ## # A tibble: 4 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <chr> ## 1 (Intercept) 0.000170 0.000112 1.51 " " ## 2 Mkt_Rf_3 0.984 0.0136 72.6 *** ## 3 SMB_3 0.0156 0.0226 0.690 " " ## 4 HML_3 0.0630 0.0200 3.15 ** tidy(regressionLists$VTV)%>% mutate(p.value = stars.pval(p.value)) ## # A tibble: 4 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <chr> ## 1 (Intercept) 0.0000514 0.0000446 1.15 " " ## 2 Mkt_Rf_3 0.918 0.00539 170. *** ## 3 SMB_3 -0.131 0.00897 -14.7 *** ## 4 HML_3 0.265 0.00794 33.4 *** tidy(regressionLists$LQD) %>% mutate(p.value = stars.pval(p.value)) ## # A tibble: 4 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <chr> ## 1 (Intercept) 0.000241 0.0000882 2.74 ** ## 2 Mkt_Rf_3 -0.0119 0.0106 -1.12 " " ## 3 SMB_3 -0.0325 0.0177 -1.84 . ## 4 HML_3 -0.157 0.0157 -10.0 ***
Finally, we can apply this same method to all our ETFs using the lapply
function to tidy
the data and the map
function to mutate
or convert the p-value to stars. Then take a random sample of 5 ETFs regressions.
library(purrr) myTidyRegressions <- lapply(regressionLists[4:length(regressionLists)], tidy) myTidiedRegressions <- map(myTidyRegressions, ~mutate(., p.value = stars.pval(p.value))) library(rlist) list.sample(myTidiedRegressions, 5) ## $DGRO ## # A tibble: 4 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <chr> ## 1 (Intercept) 0.000154 0.0000507 3.04 ** ## 2 Mkt_Rf_3 0.890 0.00612 146. *** ## 3 SMB_3 -0.145 0.0102 -14.2 *** ## 4 HML_3 0.121 0.00902 13.4 *** ## ## $EFAV ## # A tibble: 4 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <chr> ## 1 (Intercept) -0.00000220 0.000142 -0.0155 " " ## 2 Mkt_Rf_3 0.572 0.0171 33.5 *** ## 3 SMB_3 -0.112 0.0285 -3.94 *** ## 4 HML_3 -0.0172 0.0252 -0.684 " " ## ## $XLE ## # A tibble: 4 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <chr> ## 1 (Intercept) -0.000264 0.000270 -0.978 " " ## 2 Mkt_Rf_3 1.09 0.0326 33.5 *** ## 3 SMB_3 0.0920 0.0543 1.69 . ## 4 HML_3 0.716 0.0481 14.9 *** ## ## $IVW ## # A tibble: 4 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <chr> ## 1 (Intercept) 0.0000434 0.0000407 1.07 " " ## 2 Mkt_Rf_3 1.01 0.00491 206. *** ## 3 SMB_3 -0.180 0.00817 -22.0 *** ## 4 HML_3 -0.312 0.00724 -43.1 *** ## ## $VEA ## # A tibble: 4 x 5 ## term estimate std.error statistic p.value ## <chr> <dbl> <dbl> <dbl> <chr> ## 1 (Intercept) -0.000113 0.000154 -0.730 " " ## 2 Mkt_Rf_3 0.852 0.0186 45.8 *** ## 3 SMB_3 -0.0850 0.0310 -2.74 ** ## 4 HML_3 0.150 0.0274 5.45 ***
A few notes here: We should be modelling the excess returns on the ETFs and not just the ETF returns. This is simple enough by replacing for example myPortfolio
in the lm
regressions with \((myPortfolio – RF_3)\) where \(RF_3\) is the risk free rate which comes with the Fama and French data. We should also probably use \(NeweyWest\) adjusted standard errors by running coeftest(myPortfolioReg, NeweyWest(myPortfolioReg, lag = N, prewhite = FALSE))
. Literature suggests that lags of \(4(N/100)^{2/9}\) should be used, where \(N\) is the number of observations.
We could rank the ETFs based on their alphas as before and go long on the high alphas and short on the low alphas. Run our hedge portfolio through a Fama French regression here and see if we are able to obtain better performances.
Finally, here I only used the 3 Factor model. There are more factors from the literature we could use. On the Kenneth French website we can collect data on \(Market\), \(SMB\), \(HML\), \(RMW\), \(CMA\) and \(MOM\). Where the \(RMW\) factor is a profitability factor, the \(CMA\) is an investment factor and the \(MOM\) is a momentum factor.
Below I pull in the daily Fama and French 5 Factor model and plot them.
library(glue) library(timetk) library(plotly) library(pipeR) temp <- tempfile() base <- "http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/" factor <- "North_America_5_Factors" format<-"_CSV.zip" full_url <-glue(base,factor,format,sep ="") download.file(full_url,temp,quiet = TRUE) North_America_Factors <- read_csv(unz(temp, "North_America_5_Factors.csv"), skip = 6) %>% rename(date = X1) %>% mutate_at(vars(-date), as.numeric) %>% mutate(date = rollback(ymd(parse_date_time(date, "%Y%m") + months(1)))) %>% drop_na(date) %>% rename(Mkt_Rf = `Mkt-RF`) %>% select(-RF) %>% mutate_at(vars(c("Mkt_Rf", "SMB", "HML", "RMW", "CMA")), funs(./100)) %>% tk_xts(date_var = date) ax <- list( zeroline = TRUE, showline = TRUE, mirror = "ticks", gridcolor = toRGB('white'), gridwidth = 2, zerolinecolor = toRGB('white'), zerolinewidth = 4, linecolor = toRGB('white'), linewidth = 6, color = 'white' ) North_America_Factors %>>% (cumprod(1+.)) %>>% ROC(36, type = "discrete") %>>% na.omit() %>>% ( plot_ly( x = colnames(.), y = as.Date(index(.)), z = data.matrix(.), type = "surface", colors = c("darkblue", "yellow", "darkred") ) ) %>% plotly::layout( xaxis = ax, yaxis = ax, paper_bgcolor ='rgb(0,0,0)', #plot_bgcolor ='rgb(0,0,0)', title = "Fama French Factors", scene = list( xaxis = c(ax, title = "Factors", tickangle = -45), yaxis = c(ax, title = "Date", tickangle = -45), zaxis = c(ax, title = "Fama French Factor") ), title = list( family = "Agency FB", size = 50, color = 'white') )
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.