Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This post examines conditional heteroskedasticity models in the context of daily stock price data for Allied Irish Banks (AIB), specifically how to test for conditional heteroskedasticity in a series, how to approach model specification and estimation when time-varying volatility is present, and how to forecast with these models; all of this is done in R, with relevant R code provided at the bottom of the post.
Now follows a necessary disclaimer: This is not intended to be financial advice, and should not in any way be interpreted as such; this is purely an academic post, and the subtle irony of analysing price data on a bank that no longer trades on the ISEQ index should not go unnoticed by the reader. I am not responsible for any losses, financial or otherwise, incurred by anyone who dives head-first into stock/options trading with no understanding and/or appreciation of the financial risks involved, especially with any leveraged investments/products. By continuing to read this post you agree to release me of any legal liability and responsibility for your actions based on the content of this post.
Vanilla Black-Scholes-Merton option pricing formulae are based on underlying asset prices being driven by geometric Brownian motion (GBM), which implies log-normally distributed prices and normally distributed returns. The strict assumptions imposed by GBM are often a point of amusement by those looking in from outside of finance and economics, but it’s worth noting that the field does not end with this simple model. Forecasting implied volatility from pricing data is a point of interest in applied financial econometrics, and a class of models utilised for this purpose are conditional heteroskedasticity models. For those not familiar with this term, ‘heteroskedasticity’ simply means non-constant variance, where the case of constant variance called ‘homoskedasticity’.
If a stock,
Some other features of the random walk model without drift are worth noting:
Moving to an empirical example, below is a graph of Allied Irish Banks’ (AIB) daily stock price, from the beginning of 2003 to the end of 2010:
For those unfamiliar with this bank, it is essentially a government-owned entity of the Irish State, where government ownership occurred due to large losses on loans extended to property developers. Below is a graph of the log daily returns of the stock:
The marked drop in the series occurred on January 19, 2009; an astonishing 88% fall over one trading day. Volatility clustering is evident in the series, with relatively higher volatility from the middle of 2008 than the comparatively tranquil 2003 to 2008 period.
Our first task is to identify the correct (mean) model specification, here from the autoregressive moving average (ARMA) suite of models. An ARMA(p,q) model is given by
where I’ve included a drift term,
we specify the error term as
where
The GARCH(p,q) model is given by,
where we have simply added autoregressive terms to the conditional variance process. Low-order GARCH(p,q) models have been shown to fit data better than high-order ARCH(q) models. The GARCH(1,1) specification, for example, is frequently used in finance, and one can find this model discussed in popular options textbooks, such as John Hull’s Options, Futures, and Other Derivatives.
If ARCH effects are not present in a series, we would expect past squared values of the residuals to be insignificant covariates of
To identify the appropriate ARMA model, the autocorrelation function (ACF) and partial autocorrelation (PACF) of the log-returns series are given below.
We see significant sample lags out to 50 lags in both the ACF and PACF, a rather unfortunate fact that will hinder estimation of a parsimonious model. (The erudite reader might point out that this is a candidate for fractional integration, with the series exhibiting so-called long-memory, even after integer-order differencing.) It’s arbitrary, really, what model is chosen here, due to the indefinite form of the ACF and PACF graph above. However, the oscillatory nature of the ACF and positive to negative move from the first to second sample lag in the PACF are indicative of a pure AR model; though, again, any model can be justified given the ACF and PACF above. I have selected an AR(9) model, based on minimising information criteria and analysing the ACF of the resulting residuals.
With the mean model estimated to control for linear dependence, we now look to the squared residuals produced by fitting our AR(9) model. Two tests are commonly used to check for ARCH effects. The first is to estimate an autoregressive model of the squared residuals, and employ a test of joint insignificance for all the coefficients on lagged values in the model, i.e.,
The PACF of the squared residuals can be used to identify the order of an ARCH model, which is given below:
This implies a high-order ARCH(q) model, with significant sample lags at 19 and beyond. Hence, the GARCH model is preferred to estimating an AR(9)-ARCH(19) model. Computationally, this may be a non-trivial exercise and convergence may not occur depending on the suite of algorithms used. However, identifying the order of a GARCH model is essentially a guess-and-go process, with GARCH(1,1), GARCH(1,2), GARCH (2,2) (and higher) being plausible specifications. One could use information criteria here to determine the correct model specification, though some authors do caution on the exact meaning of these for GARCH processes.
Now that we have identified the presence of ARCH effects, and determined that GARCH is a preferable approach than pure ARCH, we proceed to estimate our ARMA-GARCH model. This can be easily achieved in the ‘fGarch’ package, which is part of the wider Rmetrics project. However, I will use ‘rgarch’, a relatively more flexible (beta) package available on R-Forge, that also allows for estimation of GARCH in mean models (GARCH-M) and asymmetric GARCH specifications. Whatever model one estimates, the standardised residuals produced by the estimated model should be white noise, if correctly specified.
Before I estimate the model, it’s interesting to note that the relatively more volatile period in the log-return series occurred when the stock price was falling. This can be incorporated by using a GARCH in mean, or GARCH-M, model, where the condicational variance appears in the mean equation; e.g.,
and
The
I estimate an AR(9)-EGARCH(4,2)-M model based on several factors. First, the stability of the model is undermined with higher-order speficications, noticeably so for any EGARCH(5,q). The coefficients and their respective standard errors are given in the tables below, the first table being the mean equation and the second being the variance equation. Nyblom’s parameter stability test yield a joint test statistic of 3.3267, so we cannot reject the null hypothesis of parameter stability at the 10, 5 or 1% significance levels.
Mean Equation:
Estimate | Standard Error | |
0.001166 | 0.003648 | |
0.043452 | 0.038624 | |
-0.063144 | 0.023096 | |
-0.012754 | 0.025271 | |
-0.026951 | 0.024755 | |
-0.064749 | 0.024209 | |
-0.024585 | 0.048999 | |
-0.028065 | 0.021314 | |
0.020022 | 0.037065 | |
0.024686 | 0.020654 | |
-0.101332 | 0.176526 |
Variance equation:
-0.097284 | 0.633358 | |
-0.136455 | 0.035170 | |
-0.087314 | 0.060027 | |
0.373658 | 0.131616 | |
0.407474 | 0.165395 | |
-0.541935 | 0.038899 | |
0.772521 | 0.003665 | |
0.585102 | 0.062753 | |
0.164900 | 0.020075 |
To analyse if the model is correctly specified, we first construct the standardised residuals,
Q-Statistics on Standardized Residuals:
Statistic | p-value | |
Q(10) | 8.361 | 0.5936 |
Q(15) | 11.348 | 0.7275 |
Q(20) | 20.644 | 0.4184 |
Q-Statistics on Standardized Squared Residuals:
Statistic | p-value | |
Q(10) | 13.42 | 0.2013 |
Q(15) | 19.26 | 0.2024 |
Q(20) | 22.78 | 0.2996 |
Below is an empirical density plot of the standardised residuals, which isn’t too far from what we would expect for correct model specification.
Below is the daily return series with 2 conditional standard deviations imposed on either side.
Forecasting is easily done with the rgarch package, using the ‘ugarchforecast’ function. If one re-estimates the model leaving out the last ten observations, evaluation of volatility forecasts are based on four mean loss functions: the mean square error (MSE); mean absolute deviation (MAD); QLIKE; and R2Log. I invite the reader to estimate another model of their choice and compare 10-day ahead forecasts with:
MSE | MAD | QLIKE | R2Log |
0.00021026 | 0.009247 | -4.0351 | 4.0524 |
One might prefer to use, say, weekly or monthly data rather than daily data, which is usually easier to work with, and higher-order models are rarely necessary with weekly and monthly frequencies.
R Code:
require(tseries) require(rgarch) require(urca) require(ggplot2) # a<-read.csv(file="AIB.csv") # AIB<-a$Close # AIBts<-ts(a$Close) # dates <- as.Date(a$Date, "%d/%m/%Y") # LAIBts<-log(AIBts) # retAIB<-diff(LAIBts) # df1<-data.frame(dates,AIB) # ### Plot AIB stock price: gg1.1<-ggplot(df1,aes(dates,AIB)) + xlab(NULL) + ylab("AIB Stock Price in Euro") gg1.2<-gg1.1+geom_line(colour="darkblue") + opts(title="Daily AIB Stock Price") # png(filename = "aib1.png", width = 580, height = 400, units = "px", pointsize = 12, bg = "white") gg1.2 dev.off() # ### Plot AIB stock price returns: dates2<-dates[2:length(dates)] ReturnsAIB<-as.numeric(retAIB) df2<-data.frame(dates2,ReturnsAIB) # gg2.1<-ggplot(df2,aes(dates2,ReturnsAIB)) + xlab(NULL) + ylab("Log Changes") gg2.2<-gg2.1+geom_line(colour="darkred") + opts(title="Daily AIB Stock Price Return") # png(filename = "aib2.png", width = 580, height = 400, units = "px", pointsize = 12, bg = "white") gg2.2 dev.off() # ### ACFs and PACFs png(filename = "acfpacf.png", width = 580, height = 700, units = "px", pointsize = 12, bg = "white") par(mfrow=c(2,1)) acf(retAIB, main="ACF of AIB Log Returns", lag = 50) pacf(retAIB, main="PACF of AIB Log Returns", lag = 50) dev.off() # ar9<-arima(retAIB, order=c(9,0,0)) acf(ar9$residuals) # ressq<-(ar9$residuals)^2 Box.test(ressq, lag = 8, type = "Ljung-Box") # png(filename = "pacfressq.png", width = 580, height = 350, units = "px", pointsize = 12, bg = "white") pacf(ressq, main="PACF of Squared Residuals", lag = 30) dev.off() # # retAIB2<-as.numeric(retAIB) # Note that the GARCH order is revered from what I have discussed above specm1 <- ugarchspec(variance.model=list(model="eGARCH", garchOrder=c(2,4), submodel = NULL), mean.model=list(armaOrder=c(9,0), include.mean=TRUE, garchInMean = TRUE)) fitm1 <- ugarchfit(data = retAIB2, spec = specm1) fitm1 # fittedmodel <- fitm1@fit sigma1<-fittedmodel$sigma # df2<-data.frame(dates2,ReturnsAIB,sigma1) # gg3.1<-ggplot(df2,aes(dates2)) + xlab(NULL) + ylab("Log Changes") gg3.2<-gg3.1+geom_line(aes(y = ReturnsAIB, colour="Log Returns")) + opts(title="Daily Log Return with 2 Conditional Standard Deviations") gg3.3<-gg3.2 + geom_line(aes(y = sigma1*2, colour="2 S.D.")) + geom_line(aes(y = sigma1*-2, colour="2 S.D.")) + scale_colour_hue("Series:") + opts(legend.position=c(.18,0.8)) # png(filename = "aib3.png", width = 580, height = 400, units = "px", pointsize = 12, bg = "white") gg3.3 dev.off() # fitm2 <- ugarchfit(data = retAIB2,out.sample = 10, spec = specm1) fitm2 pred <- ugarchforecast(fitm2, n.ahead = 10,n.roll = 1) pred.fpm <- fpm(pred) #
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.