Understanding PCA 3 Factors of the Yield Curve using R code
[This article was first published on K & L Fintech Modeling, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This post explains how to decompose a movement of bond yields into 3 factors (level, slope, curvature) which is the work of Litterman and Scheinkman (1991). Using R functions for the principal component analysis and eigen decomposition, we can understand the contributions of these factors. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
PCA 3 Factors of the Yield Curve
Litterman and Scheinkman (1991) use a principal component analysis (PCA) and find that US bond returns are mainly determined by three factors such as level, steepness, and curvature movements in the term structure. In this post, we redo the same work and understand the meaning of it.
Yield Curve
A yield curve is the cross-sectional relationship between its maturities and yields at a given time. The following figure depicts a pool or history of U.S. Treasury yield curves. This sample consists of monthly yield data from January 1972 to December 2000 at maturities of 3, 6, 9, 12, 15, 18, 21, 24, 30, 36, 48, 60, 72, 84, 96, 108, and 120 months.
As can be seen the above time series of yield curves, we can easily find that each yield curve at point in time is different with respect to each maximum level, long-short spread. Albeit difficult to see, relative height of medium maturity also can be a classifier.
In other words, a cross-sectional yield curve can be represented and largely explained by two influential (level, slope) and one non-negligible (curvature) components or factors. According to Litterman and Scheinkman (1991) these three factors explain 99% of the variability in the yield curve. This naturally leads to a 3-factor yield curve model which is a compression of the yield curve.
PCA
As yield curve with 17 maturities are often difficult to interpret, we have given it a try to interpret them with some roughly defined measures such as the overall level, long-short spread, relative height of medium maturity. Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing the loss of information. It create new uncorrelated variables that successively maximize variances. Finding such new variables, the principal components, reduces to solving an eigenvalue/eigenvector problem.
Eigen decomposition
Eigen decomposition creates new uncorrelated variables (principal components) which is a linear combination or weighted average of original data while preserving a covariance of original variables. This weight is the eigenvector which indicates the direction and relative magnitude of the extent to which the original variables are included when creating new variable. Therefore a multiplication of an eigen vector and original variables results in a new variable. The variance of this new variable is the eigenvalue.
When the number of original variables is N, the number of principal components or new created variables is also N. However, as the first 3 or 4 principal components can explain most of the total variation of the original dataset, we take only these small number of principal components as factors to be used.
R code
PCA is performed by using princomp() R function which takes the dataset as an argument. As PCA uses the eigen decomposition, this job also can be done by using eigen() R function which takes the covariance matrix of the dataset as an argument.
The following R code reads DRA (2006) dataset and performs PCA on yield levels or its changes by using princomp() or eigen() R functions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | #========================================================# # Quantitative ALM, Financial Econometrics & Derivatives # ML/DL using R, Python, Tensorflow by Sang-Heon Lee # # https://kiandlee.blogspot.com #——————————————————–# # PCA three factors of bond yields #========================================================# graphics.off(); rm(list = ls()) #———————————————————– # graph function #———————————————————– f_draw_factor_loading <– function(m.loadings, var, title) { x11(width=16/3,height=5); matplot(mat,m.loadings,type=“l”,xaxt=“n”,lwd=5,lty=1, main=title,col=rainbow(3),ylim=c(–0.6, 0.6), xlab=“Maturity (month)”,ylab = “Factor loadings”) legend(‘bottomright’,max(m.loadings),col=rainbow(3),lty=1,lwd=3, legend=paste0(c(“1st PC-“,“2nd PC-“,“3rd PC-“),var)) axis(1,mat,mat) } #=========================================================== # Load data #=========================================================== fn <– “http://econweb.umd.edu/~webspace/aruoba/research/paper5/DRA Data.txt” df <– read.delim(fn, header=T) yield <– as.matrix(df[,2:18]/100) ym <– df[,1] # year-month mat <– c(3,6,9,12,15,18,21,24,30,36,48,60,72,84,96,108,120) #———————————————————– # Yield level PCA using princomp() #———————————————————– pca <– princomp(yield) f_draw_factor_loading(pca$loadings[,1:3], round(100*pca$sdev[1:3]^2/sum(pca$sdev^2),2), “Yield level PCA – princomp()”) #———————————————————– # Yield change PCA using princomp() #———————————————————– pca <– princomp(diff(yield, 1)) f_draw_factor_loading(pca$loadings[,1:3], round(100*pca$sdev[1:3]^2/sum(pca$sdev^2),2), “Yield change PCA – princomp()”) #———————————————————– # Yield level PCA using eigen() #———————————————————– eig <– eigen(cov(yield)) f_draw_factor_loading(eig$vectors[,1:3], round(100*eig$values[1:3]/sum(eig$values),2), “Yield level PCA – eigen()”) #———————————————————– # Yield change PCA using eigen() #———————————————————– eig <– eigen(cov((diff(yield, 1)))) f_draw_factor_loading(eig$vectors[,1:3], round(100*eig$values[1:3]/sum(eig$values),2), “Yield change PCA – eigen()”) | cs |
The following figures depict factor loadings of 1st, 2nd, and 3rd factors of the four cases. The sign of factor loadings or eigen vectors is irrelevant since our focus is on explaining the total variation of the original variables with small factors.
The important thing is the relative magnitude or direction when interpreting them. Every time PCA or eigen decomposition is performed, their sign of factor loadings can be changed between + and -.
We can find that factor loadings of the yield level data from two functions are the same and this result holds true for the yield change data. Legends in each figures include the ratio of each variance to total variance. Each variance is the eigenvalue and total variance is the sum of the eigenvalues.
In sum, yield levels or changes are mainly explained by level, slope, curvature factors. Level factor has nearly uniform factor loadings. Slope factor loadings are higher at shorter maturities. Curvature factor loadings is higher at medium maturity and lower at shorter and longer maturities. These results are consistent with our visual inspections.
Concluding Remarks
This post uses princomp() and eigen() R functions to calculate and understand the factors and its loadings of bond yields. We can find that term structure movements are mainly explained by 3 factors quite well.
Reference
Litterman, R. and J. Scheinkman (1991), Common Factors Affecting Bond Returns. Journal of Fixed Income 1-1, 54-61. \(\blacksquare\)
To leave a comment for the author, please follow the link and comment on their blog: K & L Fintech Modeling.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.