Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The idea of this post is to give an empirical example of how Principal Component Analysis (PCA) can be applied in Finance, especially in the Fixed Income Market. Principal components are very useful to reduce data dimensionality and give a joint interpretation to a group of variables. For example, one could use it to try to extract a common trend from several variables.
This text is divided in two parts, the first is a fast review about the math behind PCA, and the second is an empirical example with American bonds.
What is PCA?
The PCA is based on the eigenvectors and eigenvalues of the matrix
One of the most important applications of PCA is in dimensionality reduction.Given that the first components explain almost all the variance of the data, one could discard dimensions after some criteria being achieved, commonly a threshold in the explained variance. We will see in the empirical example that only three dimensions are enough to deal with an original thirteen dimension problem (thirteen American bonds). It’s important to state that PCA is highly sensitive to data scale, and normally scaling and centring the variables is important.
Application in Fixed Income Market
The data used in this example can be downloaded here. It consists of annualized zero-coupon from US using monthly data from 1944 to 1992. This is exactly the same data exposed in Alexander (2001) in the book Market Models. The maturities are 1, 3, 6, 9, 12, 18 months and 2, 3, 4, 5, 7, 10, 15 years.
yields = as.matrix(read.csv('yields.csv', dec = ',', sep = ';')) colnames(yields) <- c(paste(c(1, 3, 6, 9, 12, 18), 'mth'), paste(c(2, 3, 4, 5, 7, 10, 15), 'yr')) matplot(yields, type='l', ylim = c(0,18),ylab = 'rate', xlab = 'Time')
Although yields have a strong correlation structure, they are not good for modelling because they are not stationary, so we need to proceed looking for what is called an invariant of the market. Roughly speaking, we need some sort of iid properties, and yields don’t have it. Following Meucci (2005), taking the first difference is good enough. For simplicity, we will call
returns = diff(yields, 1) plot(returns[,'1 mth'], main = 'yield return 1 month and 2 years', type = 'l', ylab = 'Return') lines(returns[,'2 yr'], col = 2)
Now we can check for some sort of correlation in the yield return matrix. This correlation is quite intuitive, if the bond with one month maturity starts to pay less, you expect some kind of impact in the bond with two months maturity. The bigger the difference between maturities, the lower the correlation, also it’s a good check, because if we can’t see correlation in the data, PCA won’t help us to reduce dimensionality.
options(digits = 2) # just for better visualisation cor(returns) # correlation matrix of yield returns ## 1 mth 3 mth 6 mth 9 mth 12 mth 18 mth 2 yr 3 yr 4 yr 5 yr 7 yr ## 1 mth 1.00 0.79 0.73 0.69 0.66 0.63 0.60 0.54 0.49 0.48 0.44 ## 3 mth 0.79 1.00 0.93 0.89 0.84 0.81 0.77 0.71 0.66 0.63 0.58 ## 6 mth 0.73 0.93 1.00 0.97 0.93 0.91 0.88 0.82 0.77 0.75 0.69 ## 9 mth 0.69 0.89 0.97 1.00 0.99 0.97 0.94 0.89 0.85 0.82 0.77 ## 12 mth 0.66 0.84 0.93 0.99 1.00 0.98 0.94 0.91 0.86 0.84 0.78 ## 18 mth 0.63 0.81 0.91 0.97 0.98 1.00 0.99 0.96 0.92 0.90 0.85 ## 2 yr 0.60 0.77 0.88 0.94 0.94 0.99 1.00 0.97 0.93 0.92 0.87 ## 3 yr 0.54 0.71 0.82 0.89 0.91 0.96 0.97 1.00 0.99 0.98 0.93 ## 4 yr 0.49 0.66 0.77 0.85 0.86 0.92 0.93 0.99 1.00 0.99 0.94 ## 5 yr 0.48 0.63 0.75 0.82 0.84 0.90 0.92 0.98 0.99 1.00 0.98 ## 7 yr 0.44 0.58 0.69 0.77 0.78 0.85 0.87 0.93 0.94 0.98 1.00 ## 10 yr 0.39 0.53 0.65 0.72 0.74 0.81 0.83 0.88 0.90 0.94 0.97 ## 15 yr 0.31 0.45 0.56 0.62 0.64 0.70 0.72 0.77 0.77 0.81 0.84 ## 10 yr 15 yr ## 1 mth 0.39 0.31 ## 3 mth 0.53 0.45 ## 6 mth 0.65 0.56 ## 9 mth 0.72 0.62 ## 12 mth 0.74 0.64 ## 18 mth 0.81 0.70 ## 2 yr 0.83 0.72 ## 3 yr 0.88 0.77 ## 4 yr 0.90 0.77 ## 5 yr 0.94 0.81 ## 7 yr 0.97 0.84 ## 10 yr 1.00 0.94 ## 15 yr 0.94 1.00
Using the function prcomp
from package stats we can perform the PCA on the returns. Although scale and center are set equals TRUE by default, it’s important to remember that PCA is sensitive to the scale of the data, so it’s necessary to standardize the variables.
Once we have components in a decreasing other of explained variance, we can project the new matrix into a lower dimensional space. It is important to observe the eigenvalues of the new matrix.
model = prcomp(returns, scale = TRUE, center = TRUE) summary(model) ## Importance of components: ## PC1 PC2 PC3 PC4 PC5 PC6 PC7 ## Standard deviation 3.257 1.204 0.635 0.5091 0.3684 0.25940 0.20749 ## Proportion of Variance 0.816 0.111 0.031 0.0199 0.0104 0.00518 0.00331 ## Cumulative Proportion 0.816 0.927 0.958 0.9783 0.9887 0.99391 0.99722 ## PC8 PC9 PC10 PC11 PC12 PC13 ## Standard deviation 0.18868 0.01381 0.01014 0.00967 0.00862 0.00729 ## Proportion of Variance 0.00274 0.00001 0.00001 0.00001 0.00001 0.00000 ## Cumulative Proportion 0.99996 0.99998 0.99998 0.99999 1.00000 1.00000 par(mfrow = c(1,2)) barplot(model$sdev^2, main = 'Eigenvalues of each component') barplot(cumsum(model$sdev^2)/sum(model$sdev^2), main = 'Cumulative Explained Variance', ylab = 'Variance Explained')
It’s straight forward to see that the projection over just three principal components can explain almost 96% percent of the variance over thirteen contracts! Looking at the factor loadings, is possible to see how each one of them affect the returns of the yields.
The figure above shows that the first factor loading is responsible for a level change, also called parallel shift, in all bonds. The second factor loading is called slope change or steepening/flattening of the curve, and the third factor loading is responsible for curvature effect, or convexity.
To recover the yield return with maturity (k) at time (t) we just need to calculate
$
$
$
Where
par(mfrow = c(1,3)) hist(model$x[,1], breaks = 20, main = 'Distribution 1 component', xlab = paste('SD', round(model$sdev[1],2))) hist(model$x[,2], breaks = 20, main = 'Distribution 2 component', xlab = paste('SD', round(model$sdev[2],2))) hist(model$x[,3], breaks = 20, main = 'Distribution 3 component', xlab = paste('SD', round(model$sdev[3],2)))
To predict and proceed with risk analysis of the yield curve, one can now model the joint distribution of the factors and get the simulated yield returns as
require(mvtnorm) n.sim = 10000 n.ahead = 5 simulated.returns = array(dim = c(5, 13, 10000)) cumulative.returns = array(dim = c(5, 13, 10000)) for (i in 1:n.sim) { simulated.factors = rmvnorm(5, mean = rep(0,3), sigma = diag(model$sdev[1:3])) simulated.returns[,,i] = simulated.factors%*%t(model$rotation[,1:3]) cumulative.returns[,,i] = apply(simulated.returns[,,i], 2, cumsum) } simulated.yields = array(dim = c(5, 13, 10000)) for (i in 1:n.sim) { for (u in 1:n.ahead) { simulated.yields[u,,i] = yields[nrow(yields), ] + cumulative.returns[u,,i] } } par(mfrow = c(1,2)) plot(yields[570:587,7], type = 'l', main = 'Yield returns 1 year', xlim = c(1,25), ylim = c(0, 12), ylab = 'Yield') for (i in 1:100) lines(c(rep(NA,17), yields[587,7], simulated.yields[,7,i]), col = i) plot(yields[570:587,12], type = 'l', main = 'Yield returns 10 years', xlim = c(1,25), ylim = c(0, 12), ylab = 'Yield') for (i in 1:100) lines(c(rep(NA,17), yields[587,12], simulated.yields[,12,i]), col = i)
And for last, the confidence intervals:
lower = matrix(ncol = ncol(yields), nrow = n.ahead); upper = matrix(ncol = ncol(yields), nrow = n.ahead); for (i in 1:ncol(yields)) { for (u in 1:n.ahead) { lower[u,i] = quantile(simulated.yields[u,i,], probs = 0.025) upper[u,i] = quantile(simulated.yields[u,i,], probs = 0.975) } } plot(yields[570:587,7], type = 'l', main = 'Yield returns 1 year', xlim = c(1,25), ylim = c(2, 10), ylab = 'Yield', xlab = '') lines(c(rep(NA,17), yields[587,7], upper[,7]), col = 2) lines(c(rep(NA,17), yields[587,7], lower[,7]), col = 2)
plot(yields[570:587,12], type = 'l', main = 'Yield returns 10 years', xlim = c(1,25), ylim = c(2, 10), ylab = 'Yield', xlab = '') lines(c(rep(NA,17), yields[587,12], upper[,12]), col = 2) lines(c(rep(NA,17), yields[587,12], lower[,12]), col = 2)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.