Setup up the inverse of additive relationship matrix in R
[This article was first published on Gregor Gorjanc (gg), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Additive genetic covariance between individuals is one of the key concepts in (quantitative) genetics. When doing the prediction of additive genetic values for pedigree members, we need the inverse of the so called numerator relationship matrix (NRM) or simply A. Matrix A has off-diagonal entries equal to numerator of Wright‘s relationship coefficient and diagonal elements equal to 1 + inbreeding coefficient. I have blogged before about setting up such inverse in R using routine from the ASReml-R program or importing the inverse from the CFC program. However, this is not the only way to “skin this cat” in R. I am aware of the following attempts to provide this feature in R for various things (the list is probably incomplete and I would grateful if you point me to other implementations):
- pedigree R package has function makeA() and makeAinv() with obvious meanings; there is also calcG() if you have a lot of marker data instead of pedigree information; there are also some other very handy functions calcInbreeding(), orderPed(), trimPed(), etc.
- pedigreemm R package does not have direct implementation to get A inverse, but has all the needed ingredients, which makes the package even more interesting
- MCMCglmm R package has function inverseA() which works with pedigree or phlyo objects; there are also handy functions such as prunePed(), rbv(), sm2asreml(), etc.
- kinship and kinship2 R packages have function kinship() to setup kinship matrix, which is equal to the half of A; there are also nice functions for plotting pedigrees etc. (see also here)
- see also a series of R scripts for relationship matrices
As I described before, the interesting thing is that setting up inverse of A is easier and cheaper than setting up A and inverting it. This is very important for large applications. This is an old result using the following matrix theory. We can decompose symmetric positive definite matrix as A = LU = LL‘ (Cholesky decomposition) or as A = LDU = LDL‘ (Generalized Cholesky decomposition), where L (U) is lower (upper) triangular, and D is diagonal matrix. Note that L and U in previous two equations are not the same thing (L from Cholesky is not equal to L from Generalized Cholesky decomposition)! Sorry for sloppy notation. In order to confuse you even more note that Henderson usually wrote A = TDT‘. We can even do A = LSSU, where S diagonal is equal to the square root of D diagonal. This can get us back to A = LU = LL‘ as LSSU = LSSL‘ = LSS‘L‘ = LS(LS)’ = L‘L (again be ware of sloppy notation)! The inverse rule says that inv(A) = inv(LDU) = inv(U) inv(D) inv(L) = inv(L)’ inv(D) inv(L) = inv(L)’ inv(S) inv(S) inv(L). I thank to Martin Maechler for pointing out to the last (obviously) bit to me. In Henderson’s notation this would be inv(A) = inv(T)’ inv(D) inv(T) = inv(T)’ inv(S) inv(S) inv(T) Uf … The important bit is that with NRM (aka A) inv(L) has nice simple structure – it shows the directed graph of additive genetic values in pedigree, while inv(D) tells us about the precision (inverse variance) of additive genetic values given the additive genetic values of parents and therefore depends on knowledge of parents and their inbreeding (the more they are inbred less variation can we expect in their progeny). Both inv(L) and inv(D) are easier to setup.
Packages MCMCglmm and pedigree give us inv(A) directly (we can also get inv(D) in MCMCglmm), but pedigreemm enables us to play around with the above matrix algebra and graph theory. First we need a small example pedigree. Bellow is an example with 10 members and there is also some inbreeding and some individuals have both, one, or no parents known. It is hard to see inbreeding directly from the table, but we will improve that later (see also here).
ped <- data.frame( id=c( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10), fid=c( NA, NA, 2, 2, 4, 2, 5, 5, NA, 8), mid=c( NA, NA, 1, NA, 3, 3, 6, 6, NA, 9))
Now we will create an object of a pedigree class and show the A = U'U stuff:
## install.packages(pkgs="pedigreemm") library(package="pedigreemm") ped2 <- with(ped, pedigree(sire=fid, dam=mid, label=id)) U <- relfactor(ped2) A <- crossprod(U) round(U, digits=2) ## 10 x 10 sparse Matrix of class "dtCMatrix" ## [1,] 1 . 0.50 . 0.25 0.25 0.25 0.25 . 0.12 ## [2,] . 1 0.50 0.50 0.50 0.75 0.62 0.62 . 0.31 ## [3,] . . 0.71 . 0.35 0.35 0.35 0.35 . 0.18 ## [4,] . . . 0.87 0.43 . 0.22 0.22 . 0.11 ## [5,] . . . . 0.71 . 0.35 0.35 . 0.18 ## [6,] . . . . . 0.71 0.35 0.35 . 0.18 ## [7,] . . . . . . 0.64 . . . ## [8,] . . . . . . . 0.64 . 0.32 ## [9,] . . . . . . . . 1 0.50 ## [10,] . . . . . . . . . 0.66 ## To check U - chol(A) round(A, digits=2) ## 10 x 10 sparse Matrix of class "dsCMatrix" ## [1,] 1.00 . 0.50 . 0.25 0.25 0.25 0.25 . 0.12 ## [2,] . 1.00 0.50 0.50 0.50 0.75 0.62 0.62 . 0.31 ## [3,] 0.50 0.50 1.00 0.25 0.62 0.75 0.69 0.69 . 0.34 ## [4,] . 0.50 0.25 1.00 0.62 0.38 0.50 0.50 . 0.25 ## [5,] 0.25 0.50 0.62 0.62 1.12 0.56 0.84 0.84 . 0.42 ## [6,] 0.25 0.75 0.75 0.38 0.56 1.25 0.91 0.91 . 0.45 ## [7,] 0.25 0.62 0.69 0.50 0.84 0.91 1.28 0.88 . 0.44 ## [8,] 0.25 0.62 0.69 0.50 0.84 0.91 0.88 1.28 . 0.64 ## [9,] . . . . . . . . 1.0 0.50 ## [10,] 0.12 0.31 0.34 0.25 0.42 0.45 0.44 0.64 0.5 1.00
Note that pedigreemm package uses Matrix classes in order to store only what we need to store, e.g., matrix U is triangular (t in "dtCMatrix") and matrix A is symmetric (s in "dsCMatrix"). To show the generalized Cholesky A = LDU (or using Henderson notation A = TDT') we use gchol() from the bdsmatrix R package. Matrix T shows the "flow" of genes in pedigree.
## install.packages(pkgs="bdsmatrix") library(package="bdsmatrix") tmp <- gchol(as.matrix(A)) D <- diag(tmp) (T <- as(as.matrix(tmp), "dtCMatrix")) ## 10 x 10 sparse Matrix of class "dtCMatrix" ## [1,] 1.000 . . . . . . . . . ## [2,] . 1.0000 . . . . . . . . ## [3,] 0.500 0.5000 1.00 . . . . . . . ## [4,] . 0.5000 . 1.000 . . . . . . ## [5,] 0.250 0.5000 0.50 0.500 1.00 . . . . . ## [6,] 0.250 0.7500 0.50 . . 1.00 . . . . ## [7,] 0.250 0.6250 0.50 0.250 0.50 0.50 1 . . . ## [8,] 0.250 0.6250 0.50 0.250 0.50 0.50 . 1.0 . . ## [9,] . . . . . . . . 1.0 . ## [10,] 0.125 0.3125 0.25 0.125 0.25 0.25 . 0.5 0.5 1 ## To check L <- T %*% diag(sqrt(DInv)) L - t(U)
Now the A inverse part (inv(A) = inv(T)' inv(D) inv(T) = inv(T)' inv(S) inv(S) inv(T) using Henderson's notation, note that ). The nice thing is that pedigreemm authors provided functions to get inv(T) and D.
(TInv <- as(ped2, "sparseMatrix")) ## 10 x 10 sparse Matrix of class "dtCMatrix" (unitriangular) ## 1 1.0 . . . . . . . . . ## 2 . 1.0 . . . . . . . . ## 3 -0.5 -0.5 1.0 . . . . . . . ## 4 . -0.5 . 1.0 . . . . . . ## 5 . . -0.5 -0.5 1.0 . . . . . ## 6 . -0.5 -0.5 . . 1.0 . . . . ## 7 . . . . -0.5 -0.5 1 . . . ## 8 . . . . -0.5 -0.5 . 1.0 . . ## 9 . . . . . . . . 1.0 . ## 10 . . . . . . . -0.5 -0.5 1 round(DInv <- Diagonal(x=1/Dmat(ped2)), digits=2) ## 10 x 10 diagonal matrix of class "ddiMatrix" ## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] ## [1,] 1 . . . . . . . . . ## [2,] . 1 . . . . . . . . ## [3,] . . 2 . . . . . . . ## [4,] . . . 1.33 . . . . . . ## [5,] . . . . 2 . . . . . ## [6,] . . . . . 2 . . . . ## [7,] . . . . . . 2.46 . . . ## [8,] . . . . . . . 2.46 . . ## [9,] . . . . . . . . 1 . ## [10,] . . . . . . . . . 2.33 round(t(TInv) %*% DInv %*% TInv, digits=2) ## 10 x 10 sparse Matrix of class "dgCMatrix" ## ... round(crossprod(sqrt(DInv) %*% TInv), digits=2) ## 10 x 10 sparse Matrix of class "dsCMatrix" ## [1,] 1.5 0.50 -1.0 . . . . . . . ## [2,] 0.5 2.33 -0.5 -0.67 . -1.00 . . . . ## [3,] -1.0 -0.50 3.0 0.50 -1.00 -1.00 . . . . ## [4,] . -0.67 0.5 1.83 -1.00 . . . . . ## [5,] . . -1.0 -1.00 3.23 1.23 -1.23 -1.23 . . ## [6,] . -1.00 -1.0 . 1.23 3.23 -1.23 -1.23 . . ## [7,] . . . . -1.23 -1.23 2.46 . . . ## [8,] . . . . -1.23 -1.23 . 3.04 0.58 -1.16 ## [9,] . . . . . . . 0.58 1.58 -1.16 ## [10,] . . . . . . . -1.16 -1.16 2.33 ## To check solve(A) - crossprod(sqrt(DInv) %*% TInv)
The second method (using crossprod) is preferred as it leads directly to symmetric matrix (dsCMatrix), which stores only upper or lower triangle. And make sure you do not do crossprod(TInv %*% sqrt(DInv)) as it is the wrong order of matrices.
As promised we will display (plot) pedigree by use of conversion functions of matrix objects to graph objects using the following code. Two examples are provided using the graph and igraph packages. The former does a very good job on this example, but otherwise igraph seems to have much nicer support for editing etc.
## source("http://www.bioconductor.org/biocLite.R") ## biocLite(pkgs=c("graph", "Rgraphviz")) library(package="graph") library(package="Rgraphviz") g <- as(t(TInv), "graph") plot(g)
## install.packages(pkgs="igraph") library(package="igraph") i <- igraph.from.graphNEL(graphNEL=g) V(i)$label <- 1:10 plot(i, layout=layout.kamada.kawai) ## tkplot(i)
To leave a comment for the author, please follow the link and comment on their blog: Gregor Gorjanc (gg).
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.