Site icon R-bloggers

Tensor Algebra: Efficient Operations on Multidimensional Arrays with R

[This article was first published on Scientific Memo, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
 Multidimensional arrays are ubiquitous. Any complex problem having multivariate observables would easily generate a need to represent corresponding data in multidimensional arrays. Most of the practitioners would choose to apply operations on these objects with an index notation.  For example a 3-D array would have 3 indices A[i, j, k] or A[i][j][k] depending on the syntax. However there are languages which takes array operations as their native component, for example APL. More familiar examples are matrix operations in MATLAB, Fortran 90 and alike. Of course this concept is noting more than vectorisation.

Mathematical objects called tensors can be used to represent multidimensional objects. In reality a scalar is rank 0 tensor,  so scalar is the simplest tensor. Rank of a tensor can be thought as objects spatial dimension.  Let’s consider matrix multiplication, where there are two indices, tensor of rank 2. If we have two matrices $A_{ij}$ and $B_{ij}$. We can multiply them with index notation as follows using an obscure language R with a naive algorithm:
< !-- HTML generated using hilite.me -->
> set.seed(42)
> A <- matrix(rnorm(4), 2, 2)
> B <- matrix(rnorm(4), 2, 2)
> C <- matrix(0, 2, 2)
> for(i in 1:2) {
+  for(j in 1:2) {
+   for(k in 1:2) {
+     C[i,j] = C[i,j] + A[i, k] * B[k,j]
+    }
+   }
+ }
> C
          [,1]       [,2]
[1,]  6.413566 -2.5178879
[2,] -2.249789  0.4908884
> A %*% B
           [,1]       [,2]
[1,]  0.5156982  2.0378605
[2,] -0.2954518 -0.9134599

You should have noticed that we introduce a third index. That is called a dummy index which we actually do the sum. This approach is technically called Einstein summation rule. So we represent the multiplication as follows: $C_{ij} = \sum_{k=1}^{m} A_{ik} B_{kj}$. Normally sum is dropped for short notation. It is quite intuitive from this notation that number of columns of A should be equal to number of rows in B to perform the sum consistently.

There are software packages that can do symbolic and numerical tensor algebra . Here we demonstrate an R package called tensorA developed by Professor van den Boogaart of TU Freiberg. We can repeat the matrix multiplication by using this package as follows.

< !-- HTML generated using hilite.me -->
> library("tensorA")
> set.seed(42)
> At <- to.tensor(rnorm(4), c(i=2, k=2))
> Bt <- to.tensor(rnorm(4), c(k=2, j=2))
> At %e% Bt
      j
i            [,1]       [,2]
  [1,]  0.5156982  2.0378605
  [2,] -0.2954518 -0.9134599
attr(,"class")
[1] "tensor" "matrix"

Note that here $\%e\%$ implies tensor multiplication with Einstein summation rule. This is pretty trivial example but if you imagine higher dimensional objects, tracking indices would be cumbersome so multi-linear algebra makes life easy.

In conclusion,  I think,  using tensor arithmetic for multidimensional arrays looks more compacts and efficient (~ 2-3 times).  A discussion related to this appeared in R help list.

To leave a comment for the author, please follow the link and comment on their blog: Scientific Memo.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.