Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R speeds up when the Basic Linear Algebra System (BLAS) it uses is well tuned. The reference BLAS that comes with R and Ubuntu isn’t very fast. On my machine, it takes 9 minutes to run a well known R benchmarking script. If I use ATLAS, an optimized BLAS that can be easily installed, the same script takes 3.5 minutes. If I use OpenBLAS, yet another optimized BLAS that is equally easy to install, the same script takes 2 minutes. That’s a pretty big improvement!
In this post, I’ll show you how to install ATLAS and OpenBLAS, demonstrate how you can switch between them, and let you pick which you would like to use based on benchmark results. Before we get started, one quick shout out to Felix Riedel: thanks for encouraging me to look at OpenBLAS instead of ATLAS in your comment on my previous post.
Installing additional BLAS libraries on Ubuntu
For Ubuntu, there are currently three different BLAS options that can be easily chosen: “libblas” the reference BLAS, “libatlas” the ATLAS BLAS, and “libopenblas” the OpenBLAS. Their package names are
$ apt-cache search libblas libblas-dev - Basic Linear Algebra Subroutines 3, static library libblas-doc - Basic Linear Algebra Subroutines 3, documentation libblas3gf - Basic Linear Algebra Reference implementations, shared library libatlas-base-dev - Automatically Tuned Linear Algebra Software, generic static libatlas3gf-base - Automatically Tuned Linear Algebra Software, generic shared libblas-test - Basic Linear Algebra Subroutines 3, testing programs libopenblas-base - Optimized BLAS (linear algebra) library based on GotoBLAS2 libopenblas-dev - Optimized BLAS (linear algebra) library based on GotoBLAS2
Since libblas already comes with Ubuntu, we only need to install the other two for our tests. (NOTE: In the following command, delete ‘libatlas3gf-base’ if you don’t want to experiment with ATLAS.):
$ sudo apt-get install libopenblas-base libatlas3gf-base
Switching between BLAS libraries
Now we can switch between the different BLAS options that are installed:
$ sudo update-alternatives --config libblas.so.3gf There are 3 choices for the alternative libblas.so.3gf (providing /usr/lib/libblas.so.3gf). Selection Path Priority Status ------------------------------------------------------------ * 0 /usr/lib/openblas-base/libopenblas.so.0 40 auto mode 1 /usr/lib/atlas-base/atlas/libblas.so.3gf 35 manual mode 2 /usr/lib/libblas/libblas.so.3gf 10 manual mode 3 /usr/lib/openblas-base/libopenblas.so.0 40 manual mode Press enter to keep the current choice[*], or type selection number:
I selected 3, so that it now shows that choice 3 (OpenBLAS) is selected:
$ sudo update-alternatives --config libblas.so.3gf There are 3 choices for the alternative libblas.so.3gf (providing /usr/lib/libblas.so.3gf). Selection Path Priority Status ------------------------------------------------------------ 0 /usr/lib/openblas-base/libopenblas.so.0 40 auto mode 1 /usr/lib/atlas-base/atlas/libblas.so.3gf 35 manual mode 2 /usr/lib/libblas/libblas.so.3gf 10 manual mode * 3 /usr/lib/openblas-base/libopenblas.so.0 40 manual mode
And we can pull the same trick to choose between LAPACK implementations. From the output we can see that OpenBLAS does not provide a new LAPACK implementation, but ATLAS does:
$ sudo update-alternatives --config liblapack.so.3gf There are 2 choices for the alternative liblapack.so.3gf (providing /usr/lib/liblapack.so.3gf). Selection Path Priority Status ------------------------------------------------------------ * 0 /usr/lib/atlas-base/atlas/liblapack.so.3gf 35 auto mode 1 /usr/lib/atlas-base/atlas/liblapack.so.3gf 35 manual mode 2 /usr/lib/lapack/liblapack.so.3gf 10 manual mode
So we will do nothing in this case, since OpenBLAS is supposed to use the reference implementation (which is already selected).
Checking that R is using the right BLAS
Now we can check that everything is working by starting R in a new terminal:
$ R R version 3.0.1 (2013-05-16) -- "Good Sport" Copyright (C) 2013 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) ...snip... Type 'q()' to quit R. >
Great. Let’s see if R is using the BLAS and LAPACK libraries we selected. To do so, we open another terminal so that we can run a few more shell commands. First, we find the PID of the R process we just started. Your output will look something like this:
$ ps aux | grep exec/R 1000 18065 0.4 1.0 200204 87568 pts/1 Sl+ 09:00 0:00 /usr/lib/R/bin/exec/R root 19250 0.0 0.0 9396 916 pts/0 S+ 09:03 0:00 grep --color=auto exec/R
The PID is the second number on the ‘/usr/lib/R/bin/exec/R’ line. To see
which BLAS and LAPACK libraries are loaded with that R session, we use the “list open files” command:
$ lsof -p 18065 | grep 'blas\|lapack' R 18065 nathanvan mem REG 8,1 9342808 12857980 /usr/lib/lapack/liblapack.so.3gf.0 R 18065 nathanvan mem REG 8,1 19493200 13640678 /usr/lib/openblas-base/libopenblas.so.0
As expected, the R session is using the reference LAPACK (/usr/lib/lapack/liblapack.so.3gf.0) and OpenBLAS (/usr/lib/openblas-base/libopenblas.so.0)
Testing the different BLAS/LAPACK combinations
I used Simon Urbanek’s most recent benchmark script. To follow along, first download it to your current working directory:
$ curl http://r.research.att.com/benchmarks/R-benchmark-25.R -O
And then run it:
$ cat R-benchmark-25.R | time R --slave Loading required package: Matrix Loading required package: lattice Loading required package: SuppDists Warning message: In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, : there is no package called ‘SuppDists’ ...snip...
Ooops. I don’t have the SuppDists package installed. I can easily load it via Michael Rutter’s ubuntu PPA:
$ sudo apt-get install r-cran-suppdists
Now Simon’s script works wonderfully. Full output
$ cat R-benchmark-25.R | time R --slave Loading required package: Matrix Loading required package: lattice Loading required package: SuppDists Warning messages: 1: In remove("a", "b") : object 'a' not found 2: In remove("a", "b") : object 'b' not found R Benchmark 2.5 =============== Number of times each test is run__________________________: 3 I. Matrix calculation --------------------- Creation, transp., deformation of a 2500x2500 matrix (sec): 1.36566666666667 2400x2400 normal distributed random matrix ^1000____ (sec): 0.959 Sorting of 7,000,000 random values__________________ (sec): 1.061 2800x2800 cross-product matrix (b = a' * a)_________ (sec): 1.777 Linear regr. over a 3000x3000 matrix (c = a \ b')___ (sec): 1.00866666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.13484335940626 II. Matrix functions -------------------- FFT over 2,400,000 random values____________________ (sec): 0.566999999999998 Eigenvalues of a 640x640 random matrix______________ (sec): 1.379 Determinant of a 2500x2500 random matrix____________ (sec): 1.69 Cholesky decomposition of a 3000x3000 matrix________ (sec): 1.51366666666667 Inverse of a 1600x1600 random matrix________________ (sec): 1.40766666666667 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.43229160585452 III. Programmation ------------------ 3,500,000 Fibonacci numbers calculation (vector calc)(sec): 1.10533333333333 Creation of a 3000x3000 Hilbert matrix (matrix calc) (sec): 1.169 Grand common divisors of 400,000 pairs (recursion)__ (sec): 2.267 Creation of a 500x500 Toeplitz matrix (loops)_______ (sec): 1.213 Escoufier's method on a 45x45 matrix (mixed)________ (sec): 1.32600000000001 -------------------------------------------- Trimmed geom. mean (2 extremes eliminated): 1.23425893178325 Total time for all 15 tests_________________________ (sec): 19.809 Overall mean (sum of I, II and III trimmed means/3)_ (sec): 1.26122106386747 --- End of test --- 134.75user 16.06system 1:50.08elapsed 137%CPU (0avgtext+0avgdata 1949744maxresident)k 448inputs+0outputs (3major+1265968minor)pagefaults 0swaps
Where the elapsed time at the very bottom is the part that we care about. With OpenBLAS and the reference LAPACK, the script took 1 minute and 50 seconds to run. By changing around the selections with update-alternatives, we can test out R with ATLAS (3:21) or R with the reference BLAS (9:13). For my machine, OpenBLAS is a clear winner.
Give it a shot yourself. If you find something different, let me know.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.