Site icon R-bloggers

Better Portfolio Performance with Factor Model Monte Carlo In R

[This article was first published on R Codes – Light Finance, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A common problem when evaluating a portfolio manager is that the history of returns is often so short that estimates of risk and performance measures can be highly unreliable. A similar problem occurs when testing a new trading strategy. Even if you have a fairly long history for the strategy’s performance, often you only have observations over a single market cycle which makes it difficult to evaluate how your strategy would have held up in other markets. If you trade stocks you have probably heard the refrain: “I’ve never seen a bad back test”.

One method to address this deficiency is through Factor Model Monte Carlo (FMMC). By using FMMC we can estimate a factor model based on a set of financial and economic factors that reliable explain the returns of the fund manager. We can then simulate returns to determine how the manager would have performed under a wide variety of market environments. The end result is a model that produces considerably better estimates for risk and performance than if we simply used the return series available to us.

The Task and Set Up

For this case study, we will be analyzing the returns for the new hedge fund Aric’s Hedge Fund; hereafter known as AHF. The hedge fund case is particularly interesting because hedge funds can use leverage, invest in any asset class, go long or short, and use many different instruments. Hedge funds are often very secretive about their strategy and holdings and often it is difficult to tell what they are doing. Thus, having a reliable risk model to explain the source of their returns is essential.

Keep in mind that Aric’s Hedge Fund is not a real hedge fund (I’m Aric, I don’t have a hedge fund), but this is a real series of returns. I obtained the returns for a hedge fund in operation that we invest in where I work so the results of this study are applicable to a real-world scenario.

We have data for Aric’s Hedge Fund from January 2010 to March 2020. For the purpose of this post and evaluating the accuracy of our model we will pretend as though AHF is pretty new to the scene and that we only have data from January 2017 through March 2020. To overcome the data deficiency, we will build a factor model on the basis of this “observed data” and then utilize the entire data series to evaluate the accuracy of our simulation for assessing the risk and performance statistics.

The below graph shows the cumulative return of AHF since January 2010. The data to the right of the red line represents the “observed period”.

We will be conducting the analysis in R using the extensive library of packages available therein including: PerformanceAnalytics and quantmod. Aside from the hedge fund returns series, all of the factor data can be obtained freely from Yahoo! Finance, the Federal Reserve Bank of St. Louis FRED Database, and the Credit Suisse Hedge Fund Indices; you have to sign up for Credit Suisse to access the indices, but still…free.

Model Estimation

A common technique in empirical finance is to explain changes in asset prices based on a set of common risk factors. The simplest and most well-known factor model is the Capital Asset Pricing Model (CAPM) of William Sharpe. The CAPM is specified as follows:

Where:

Market risk, or “systematic” risk, serves as a kind of summary measure for all of the risks to which financial assets are exposed. This may include recessions, inflation, changes in interest rates, political turmoil, natural disasters, etc. Market risk is usually proxied by the returns on a large index like the S&P 500 and cannot be reduced through diversification.  ẞ (i.e. Beta) represents an asset’s exposure to market risk. A Beta = 1 would imply that the asset is as risky as the market, Beta >1 would imply more risk than the market, while a Beta < 1 would imply less risk.  Ɛ is idiosyncratic risk and represents the portion of the return that cannot be explained by the Market Risk factor.

We will extend the CAPM to include additional risk factors which the literature has shown to be important for explaining asset returns. Aric’s Hedge Fund runs a complicated strategy using many different asset classes and instruments so it’s certainly plausible that it would be exposed to a broader set of risks beyond the traditional market index. The general form of our factor model is as follows:

All the above says is that returns (r) are explained by a set of risk factors j=1…k where rj is the return for factor ‘j’ and  ẞj is the exposure.  Ɛ is the idiosyncratic error.

Thus, if we can estimate the j, then we can leverage the long history of factor returns (rj) that we have to calculate conditional returns for AHF. Finally, if we can reasonably estimate the distribution of Ɛ then we can build randomness into AHF’s return series. This enables us to fully capture the variety of returns that we could observe.

The FMMC method will take place in three parts:

Part A: Data Acquisition, Clean Up and Processing

For the factor model I will be using a set of financial and economic variables aimed at measuring different sources of risk and return. Again, all the data used in this study are freely available from Yahoo! Finance, the FRED Database, and Credit Suisse.

We’ll begin with the FRED data. Next to each variable I have placed the unique identifier that you can query from the database.

FRED Variables:

The FRED API leaves something to be desired and does not allow you to pull data in a consistent way. The returns of AHF are monthly so our model will need to be estimated using monthly data. However, FRED retrieves data at the highest available frequency so daily data always comes in as daily. Furthermore, the data is retrieved from the beginning of the series, so you end up getting a lot of NAs. As such, we will need to do a little clean up before we proceed.

The following segments of R code show loading the identifiers into variables and separate queries to FRED for the daily and monthly data. The daily data is cleaned and converted to a monthly frequency. I’ve tried to comment the code as much as possible so you can see what’s happening.

## ------------------------------------ FACTOR DATA ----------------------------------- ##

## Set 1: FRED Data. Each series has a unique ID which we assign to the name. Units listed to the right.

inflation <- "T5YIFRM" ## Expected inflation rate over the five-year period that begins five years from today. Monthly

term <- "T10Y3M" ## Term premium. 10 Year minus 3 month Treasury spread, Constant Maturity. Daily.

credit <- "BAA10Y" ##Credit spread premium. Moody's Baa corp bond minus 10 year Treasury, Constant maturity. Daily.

t.bill <- "DGS3MO" ## 3-month T-bill rate. Daily.

ted <- "TEDRATE" ## TED Spread. 3-Month LIBOR Minus 3-Month Treasury Yield. Daily

intl.bonds <- "IRLTLT01EZM156N" ## 10 year government bond yields for Euro Area. Monthly

corp.tr <- "BAMLCC0A0CMTRIV" ## ICE BofAML Corp bond master total return index; in levels. Daily.

vix <- "VIXCLS" ## CBOE Volatility Index. Daily.

treas.vix <- "VXTYN" ## CBOE Volatility Index of US 10-Year Treasuries. Daily.


## Pull daily data from FRED
start.date <- "2009-12-01"
end.date <- "2020-03-31"

fred.symbols.daily <- c(term, credit, t.bill, ted, corp.tr, vix, treas.vix)

fred.daily <- NULL

for (i in 1:length(fred.symbols.daily)){
        
        symbol_price <- getSymbols(fred.symbols.daily[i], 
                                   from= start.date, 
                                   to= end.date, 
                                   src = "FRED", 
                                   auto.assign = FALSE)
        
        fred.daily <- cbind(fred.daily, symbol_price)
}

## Some of the observations for the 3-month Treasury Rate are zero which messes with the calculation. Fill in zero values
## with .01 to bound the rate.

fred.daily$DGS3MO[ fred.daily$DGS3MO == 0 ] <- .01

##Convert daily FRED data to monthly frequency

fred.day.to.month <- to.monthly(fred.daily, OHLC = FALSE)

colnames(fred.day.to.month) <- c("term", "credit", "t.bill", "ted", "corp.tr", "vix", "treas.vix")

fred.day.to.month[,5:7] <- CalculateReturns(fred.day.to.month[,5:7], method = "log")

## Subset data to remove NA from first observation on return variables

fred.day.to.month <- fred.day.to.month["2003-02/"]

## Pull monthly data from FRED

fred.symbols.monthly <- c(inflation, intl.bonds)

fred.monthly <- NULL

for (i in 1:length(fred.symbols.monthly)){
        
        symbol_price <- getSymbols(fred.symbols.monthly[i], 
                                   from="2003-01-02", 
                                   to="2020-03-31", 
                                   src = "FRED", 
                                   auto.assign = FALSE)
        
        fred.monthly <- cbind(fred.monthly, symbol_price)
}

fred.monthly <- xts(coredata(fred.monthly), order.by = as.yearmon(index(fred.monthly)))

fred.monthly <- na.locf(fred.monthly["2003-01-02/2020-03-31"])

colnames(fred.monthly) <- c("inflation", "intl.bonds")

fred.monthly <- na.omit(CalculateReturns(fred.monthly, method = "log"))

## Combine FRED data

fred.data <- merge.xts(fred.day.to.month, fred.monthly, join = "inner")

any(is.na(fred.data)) ## Check if any values are NA
any(is.infinite(fred.data)) ## Check if any values are Inf

The FRED data is good to go. The other set of variables that we will need are financial market indices. Growth, Value and Size indices feature prominently in asset pricing models such as the Fama-French 3-Factor Model and I take the same approach here. Returns from financial indices are obtained from the venerable Yahoo! Finance.

Yahoo! Finance Variables:

## Set 2: Yahoo! Finance Data.

value.tr <- "^RLV" ## Russell 1000 Value index.

growth.tr <- "^RLG" ## Russell 1000 Growth index

size <- "^RUT" ## Russell 2000 index

market <- "^GSPC" ## S&amp;P 500 total return index

intl.equity <- "EFA" ## ETF replicating MSCI EAFE

agg.bonds <- "AGG" ## ETF replicating Barclays Aggregate Bond Index


## Pull data from Yahoo! Finance

yahoo.symbols <- c(value.tr, growth.tr, size, market, intl.equity, agg.bonds)

index.data <- NULL

for (i in 1:length(yahoo.symbols)){
        symbol_price <- getSymbols(yahoo.symbols[i], 
                                   from="2003-01-02", 
                                   to="2020-03-31", 
                                   src = "yahoo", 
                                   auto.assign = FALSE)
        
        index.data <- cbind(index.data, to.monthly(symbol_price, OHLC = FALSE))
}

index.data <- na.omit(na.locf(Ad(index.data)))
colnames(index.data) <- c(value.tr, growth.tr, size, market, intl.equity, agg.bonds)
index.data <- na.omit(CalculateReturns(index.data, method = "log"))

Lastly, we’ll load in the hedge fund specific indices courtesy of Credit Suisse (CS). Obtaining the index requires a few extra steps as the data needs to manually downloaded to Excel from the Credit Suisse website and then loaded into R. Each index corresponds to a specific hedge fund strategy.

Credit Suisse Variables:

## Set 3: Credit Suisse Data

Credit_Suisse <- read_excel("C:/Users/Aric/Documents/Project First Light/Blog Posts/Multi Factor Monte Carlo 8-16-20/Credit Suisse Data.xlsx", 
                            col_types = c("date", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric",
                                          "numeric", "numeric", "numeric", "numeric", "numeric", "numeric"))

CS.dates <- as.yearmon(Credit_Suisse$Date, format = "%m-%Y")

Credit_Suisse <- xts(Credit_Suisse[2:13], order.by = CS.dates)

Finally, we combine both the FRED Data, Yahoo! Finance, and Credit Suisse data into one nice dataframe.

## Combine index and FRED data

model.data <- merge(index.data, fred.data, join = "inner")

model.data <- merge(model.data, Credit_Suisse, join = "inner")

model.data[ ,7:10] <- model.data[ ,7:10]/100

Part B: Model Estimation

Recall that for the purpose of this case study we are “pretending” as though we only have data for AHF from January 2017 through March 2020 (i.e. the sample period). In reality we have data going back to January 2010. We will use the data in the sample period to calibrate the factor model and then compare the results from the simulation to the long-run risk and performance over the full period of January 2010 to March 2020.

Model estimation has 2-steps:

  1. Estimate a Factor Model: Using the common “short” history of asset and factor returns, compute a factor model with intercept , factor betas  j=1…k, and residuals .
  2. Estimate Error Density: Use the residuals  from the factor model to fit a suitable density function from which we can draw.

I have proposed 27 risk factors to explain the returns of AHF, but I don’t know ahead of time which form the best prediction. It could be that some factors are irrelevant and reduce the explanatory power of the model. In order to select an optimal model, I use an Adjusted-R2 based best-subset selection algorithm available through the leaps package. leaps performs an exhaustive, regression-based search across the proposed variables and selects the model with the highest Adjusted-R2. The algorithm proposes the following 14-factor model with Adjusted-R2 of .9918:

## ----------------------------------  FACTOR MODEL --------------------------------- ##

## Create a subset of model and portfolio data to be used for model calibration. We will use 39 months 
## for calibration (i.e. Jan 2017 to March 2020)

cal.data <- model.data["2017-01/"]

cal.port <- returns["2017-01/"]
colnames(cal.port) <- "AHF"

## Use "leaps" package to select the best model

cal.search <- leaps(x = cal.data, y = cal.port, int = TRUE, method = "adjr2", nbest = 14)

cal.variables <- which.max(cal.search$adjr2) ##Select model that produces the highest Adj. R-squared

cal.data <- cal.data[ ,as.vector(cal.search$which[cal.variables, ])] ##Use logical subset to extract data from model data

Now that we have selected our variables, we can estimate the calibrated factor model and see how it does.

## Estimate calibrated model

cal.model <- lm(cal.port~., data = cal.data)
summary(cal.model)

cal.fitted <- fitted(cal.model)
cal.residuals <- residuals(cal.model)

plot(cbind(cal.port, cal.fitted), main = "Calibration Realized v. Fitted Values", legend.loc = "bottomleft",
     col = c("black", "blue"))

hist(cal.residuals, main = "Histogram of Residuals", xlab = "Residuals", col = "light gray")

Based on the results of the regression, we observe that AHF is significantly exposed to traditional sources of risk. Specifically, AHF appears to trade equity and debt and may employ derivatives to either hedge or speculate.

Positive exposure to both the S&P 500 (GSPC) and MSCI (EFA) indices suggests that AHF trades global equity and has a long bias. The positive value for AGG further suggests they trade fixed income but may have a slight preference for Treasuries based on the negative coefficient for the corporate bond total return index (corp. tr). The generally significant results for the various hedge fund strategies suggests that AHF employs a complex trading strategy and may use derivatives such as futures (highly significant value for the CS Managed Futures Index, MGND_FT). Futures may be used to either hedge positions or target access to a specific market.

The plot of AHF’s realized returns v. the fitted values from the model demonstrates a high degree of fit and explanatory power.

Part C: Simulation

Parametric and non-parametric Monte Carlo methods are both widely applied in empirical finance, but either presents challenges for estimation.

Parametric estimation of factor densities requires fitting a large multivariate, fat-tailed probability distribution; which in our specific case would contain 14 variables. Correlations can be notoriously unstable and inaccurate estimation of the variance-covariance matrix would bias the distribution from which we will draw the factor returns. This problem may be overcome by employing copula methods, but this adds to the complexity of the model. On balance, we would prefer to avoid parametric estimation if possible.

A potential alternative is non-parametric estimation. To conduct a non-parametric simulation, we could bootstrap the observed, discrete empirical distribution that assigns a probability of 1/T to each of the observed factor returns for t=1…T. This would serve as a proxy to the true density of factor returns and allow us to bypass the messy process of estimating the correlations. However, bootstrap resampling can result in the duplication of some values and the omission of others and while this may be appropriate for inference, it does not provide an obvious advantage in our application.

A more efficient method is simply to take the relatively long history of factor returns as given and add each of the residuals. Simply put, we have 123 months of factor returns (January 2010-March 2020) and 39 residuals (based on the results of the calibration portfolio which spans January 2017-March 2020). If we add each of the 39 residuals to the 123 factor returns, we can produce 123×39 scenarios for the return of AHF (4797 observations in total). This large sample should be capable of providing us with good insight into the tails of AHF’s return distribution and has the advantage of utilizing all of the observed data.

The simulation proceeds as follows:

## -------------------------------- SIMULATION ------------------------------------- ##

## Rather than performing a bootstrap simulation, we will use the relatively long history of factor returns and model 
## residuals to compute the conditional returns. With ~10 years of factor data and 39 months of residuals we can simulate
## 4797 different monthly observations by separately adding a different residual for each month of factor data. 

## Step 1: Fit factor data to calibrated model.

predict <- predict(cal.model, model.data["2010-01/2020-03"])

predict.dates <- seq(as.Date(as.yearmon("2010-01", format  = "%Y-%m")), 
                     length = length(na.omit(predict)), 
                     by = "months")

predict <- as.xts(predict, order.by = predict.dates)


## Step 2: Extract and format residuals as XTS.

cal.res.dates <- seq(as.Date(as.yearmon("2017-01", format  = "%Y-%m")), 
                     length = length(na.omit(cal.residuals)), 
                     by = "months")

cal.residuals <- xts(as.numeric(cal.residuals), order.by = cal.res.dates)


## Step 3: Simulate.

simulation.final <- matrix(data = 0, ncol = ncol(predict)*nrow(cal.residuals), nrow = nrow(predict))

for (i in 1:nrow(predict)){
        
        for (j in 1:length(cal.residuals)){
                
                simulation.final[i,j] <- as.numeric(predict[i]) + as.numeric(cal.residuals[j])
        }
        
}

simulation.final <- as.xts(simulation.final, order.by = predict.dates)

Performance Analysis

Recall that when we introduced this exercise we pretended as though we only had the performance history of AHF from January 2017 through March 2020. Such a short history of performance alone provides only limited insight into the risk/return features of a fund manager over a relatively narrow window of market conditions. To address this shortcoming and provide a more accurate picture of performance we have proposed using Factor Model Monte Carlo (FMMC). The factor model was calibrated using the short, common history of factor and fund returns. The Monte Carlo experiment used factor returns over a longer horizon (January 2010 through March 2020) and the realized factor model residuals to construct 4797 simulated returns for AHF.

To evaluate the performance of our model we will focus on the results for the mean annual return and volatility as well as the venerable Sharpe and Sortino Ratios. Let’s see how we did.

1. Average Return

The table below depicts the mean (i.e. average) annual return for the factor model Monte Carlo (FMMC), full history of AHF (January 2010-March 2020) and the truncated/”observed” history (January 2017-March 2020):

FMMCFullTruncated
4.39%4.30%2.72%

Immediately we can see the improvement that the FMMC model has over the Truncated period. The FMMC model is able to fully capture the return dynamic of AHF while the Truncated return substantially underestimates full history mean.

2. Volatility

Accurate estimation of the mean alone cannot support the claim that our model is robust. Of equal importance is the volatility. The below table shows the annualized volatility (i.e. standard deviation) for each of the periods under consideration:

FMMCFullTruncated
8.53%8.73%8.68%

Both the FMMC and Truncated estimates slightly undershoot the realized volatility of AHF over the full period. However, both estimated are very close.

3. Sharpe Ratio

With the mean and volatility estimates in hand, we can now calculate the Sharpe Ratio. The Sharpe Ratio is calculated as follows:

For most of the test period (January 2010-March 2020) the risk-free rate as proxied by the yield on the 3-month T-Bill was very close to 0%. For simplicity we will adopt 0% as the risk-free rate for our calculations. The below table shows the results:

FMMCFullTruncated
.516.493.313

The FMMC estimate shows dramatic improvement over the Truncated for estimating the Sharpe Ratio of AHF. This is not necessarily surprising as above we showed the mean return for the Truncated period to be poor while the estimate for the FMMC was quite close. Naturally this will feed into the results for Sharpe, but, again, the results show the utility of the FMMC approach.

4. Sortino Ratio

Finally, we turn to the Sortino Ratio. Sortino is similar to Sharpe, but instead of total volatility, it is focused on what is termed “downside volatility”; or the standard deviation of returns below a stated threshold. Typically, the threshold is set to 0%; the idea being that volatile, positive returns are not considered “bad” because you are still making money, but volatile negative returns suggest an outsized chance of large losses. A higher ratio is considered better. The Sortino Ratio is calculated as follows:

The table depicts the results:

FMMCFullTruncated
.226.206.117

The FMMC estimate is very close to the full period and accurately expresses the volatility of the downside returns. We see marked improvement over the Truncated estimate which is lower due to a combination of a lower average return and more downside volatility.

## ------------------------------- RISK ANALYSIS ----------------------------------- ##

## Step 1: Simulation Risk Statistics

## Means

sim.means <- apply(simulation.final, 2, mean)

hist(sim.means)

fmmc.mean <- mean(sim.means)

fmmc.mean.se <- sd(sim.means)

## Volatilities

sim.vols <- apply(simulation.final, 2, sd)

hist(sim.vols)

fmmc.vol <- mean(sim.vols)

fmmc.vol.se <- sd(sim.vols)

## Sharpe

sim.sharpe <- apply(simulation.final, 2, SharpeRatio, FUN = "StdDev")

hist(sim.sharpe)

fmmc.sharpe <- mean(sim.sharpe)

fmmc.sharpe.se <- sd(sim.sharpe)

## Sortino

sim.sortino <- apply(simulation.final, 2, SortinoRatio)

fmmc.sortino <- mean(sim.sortino)

fmmc.sortino.se <- sd(sim.sortino)


## Step 2: Full AHF Risk Statistics

## Means
full.mean <- mean(returns)

## Volatilities
full.sd <- sd(returns)

## Sharpe
full.sharpe <- SharpeRatio(returns, FUN = "StdDev", annualize = FALSE)

## Sortino
full.sortino <- SortinoRatio(returns)


## Step 3: Truncated AHF Risk Statistics (i.e. statistics using calibration period only).

## Means
truncated.mean <- mean(cal.port)

## Volatilities
truncated.sd <- sd(cal.port)

## Sharpe
truncated.sharpe <- SharpeRatio(cal.port, FUN = "StdDev", annualize = FALSE)

## Sortino
truncated.sortino <- SortinoRatio(cal.port)

Concluding Comments

Manager evaluation is one of the oldest and most common problems in investment finance. When the history of manager returns is short it can be difficult to assess the efficacy of the strategy which has ramifications for both fund managers and fund allocators.

In this post, we introduced Factor Model Monte Carlo (FMMC) as a possible solution to the short history problem and used the real world example of Aric’s Hedge Fund (AHF) to demonstrate its efficacy. By using a factor model and the common, short history of fund and factor returns, we estimated the exposure of AHF to different sources of economic and market risk. We were then able to simulate the returns of the AHF to construct a longer history of returns with the goal of gaining improved insight into the fund’s long term performance.

The results from the FMMC method showed dramatic improvement over using the short history of returns in isolation. Using the full history of returns for AHF as a comparison, we see that the FMMC method is able to very accurately model the return, volatility, Sharpe and Sortino Ratios of the fund. By comparison, the truncated history of returns severely underestimated the performance of AHF which would have the consequence of misleading investors.

Factor Model Monte Carlo has proven to be an effective technique for modeling risk and return for complex strategies and serves as a powerful addition to the fund analyst’s tool kit.

Until next time, thanks for reading!

Aric Lux.

The post Better Portfolio Performance with Factor Model Monte Carlo In R appeared first on Light Finance.

To leave a comment for the author, please follow the link and comment on their blog: R Codes – Light Finance.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.