#10: Compacting your Shared Libraries, After The Build
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Welcome to the tenth post in the rarely ranting R recommendations series, or R4 for short. A few days ago we showed how to tell the linker to strip shared libraries. As discussed in the post, there are two options. One can either set up ~/.R/Makevars
by passing the strip-debug
option to the linker. Alternatively, one can adjust src/Makevars
in the package itself with a bit a Makefile magic.
Of course, there is a third way: just run strip --strip-debug
over all the shared libraries after the build. As the path is standardized, and the shell does proper globbing, we can just do
$ strip --strip-debug /usr/local/lib/R/site-library/*/libs/*.so
using a double-wildcard to get all packages (in that R package directory) and all their shared libraries. Users on macOS probably want .dylib
on the end, users on Windows want another computer as usual (just kidding: use .dll
). Either may have to adjust the path which is left as an exercise to the reader.
The impact can be Yuge as illustrated in the following dotplot:
This illustration is in response to a mailing list post. Last week, someone claimed on r-help that tidyverse would not install on Ubuntu 17.04. And this is of course patently false as many of us build and test on Ubuntu and related Linux systems, Travis runs on it, CRAN tests them etc pp. That poor user had somehow messed up their default gcc
version. Anyway: I fired up a Docker container, installed r-base-core
plus three required -dev
packages (for xml2, openssl, and curl) and ran a single install.packages("tidyverse")
. In a nutshell, following the launch of Docker for an Ubuntu 17.04 container, it was just
$ apt-get update $ apt-get install r-base libcurl4-openssl-dev libssl-dev libxml2-dev $ apt-get install mg # a tiny editor $ mg /etc/R/Rprofile.site # to add a default CRAN repo $ R -e 'install.packages("tidyverse")'
which not only worked (as expected) but also installed a whopping fifty-one packages (!!) of which twenty-six contain a shared library. A useful little trick is to run du
with proper options to total, summarize, and use human units which reveals that these libraries occupy seventy-eight megabytes:
root@de443801b3fc:/# du -csh /usr/local/lib/R/site-library/*/libs/*so 4.3M /usr/local/lib/R/site-library/Rcpp/libs/Rcpp.so 2.3M /usr/local/lib/R/site-library/bindrcpp/libs/bindrcpp.so 144K /usr/local/lib/R/site-library/colorspace/libs/colorspace.so 204K /usr/local/lib/R/site-library/curl/libs/curl.so 328K /usr/local/lib/R/site-library/digest/libs/digest.so 33M /usr/local/lib/R/site-library/dplyr/libs/dplyr.so 36K /usr/local/lib/R/site-library/glue/libs/glue.so 3.2M /usr/local/lib/R/site-library/haven/libs/haven.so 272K /usr/local/lib/R/site-library/jsonlite/libs/jsonlite.so 52K /usr/local/lib/R/site-library/lazyeval/libs/lazyeval.so 64K /usr/local/lib/R/site-library/lubridate/libs/lubridate.so 16K /usr/local/lib/R/site-library/mime/libs/mime.so 124K /usr/local/lib/R/site-library/mnormt/libs/mnormt.so 372K /usr/local/lib/R/site-library/openssl/libs/openssl.so 772K /usr/local/lib/R/site-library/plyr/libs/plyr.so 92K /usr/local/lib/R/site-library/purrr/libs/purrr.so 13M /usr/local/lib/R/site-library/readr/libs/readr.so 4.7M /usr/local/lib/R/site-library/readxl/libs/readxl.so 1.2M /usr/local/lib/R/site-library/reshape2/libs/reshape2.so 160K /usr/local/lib/R/site-library/rlang/libs/rlang.so 928K /usr/local/lib/R/site-library/scales/libs/scales.so 4.9M /usr/local/lib/R/site-library/stringi/libs/stringi.so 1.3M /usr/local/lib/R/site-library/tibble/libs/tibble.so 2.0M /usr/local/lib/R/site-library/tidyr/libs/tidyr.so 1.2M /usr/local/lib/R/site-library/tidyselect/libs/tidyselect.so 4.7M /usr/local/lib/R/site-library/xml2/libs/xml2.so 78M total root@de443801b3fc:/#
Looks like dplyr wins this one at thirty-three megabytes just for its shared library.
But with a single stroke of strip
we can reduce all this down a lot:
root@de443801b3fc:/# strip --strip-debug /usr/local/lib/R/site-library/*/libs/*so root@de443801b3fc:/# du -csh /usr/local/lib/R/site-library/*/libs/*so 440K /usr/local/lib/R/site-library/Rcpp/libs/Rcpp.so 220K /usr/local/lib/R/site-library/bindrcpp/libs/bindrcpp.so 52K /usr/local/lib/R/site-library/colorspace/libs/colorspace.so 56K /usr/local/lib/R/site-library/curl/libs/curl.so 120K /usr/local/lib/R/site-library/digest/libs/digest.so 2.5M /usr/local/lib/R/site-library/dplyr/libs/dplyr.so 16K /usr/local/lib/R/site-library/glue/libs/glue.so 404K /usr/local/lib/R/site-library/haven/libs/haven.so 76K /usr/local/lib/R/site-library/jsonlite/libs/jsonlite.so 20K /usr/local/lib/R/site-library/lazyeval/libs/lazyeval.so 24K /usr/local/lib/R/site-library/lubridate/libs/lubridate.so 8.0K /usr/local/lib/R/site-library/mime/libs/mime.so 52K /usr/local/lib/R/site-library/mnormt/libs/mnormt.so 84K /usr/local/lib/R/site-library/openssl/libs/openssl.so 76K /usr/local/lib/R/site-library/plyr/libs/plyr.so 32K /usr/local/lib/R/site-library/purrr/libs/purrr.so 648K /usr/local/lib/R/site-library/readr/libs/readr.so 400K /usr/local/lib/R/site-library/readxl/libs/readxl.so 128K /usr/local/lib/R/site-library/reshape2/libs/reshape2.so 56K /usr/local/lib/R/site-library/rlang/libs/rlang.so 100K /usr/local/lib/R/site-library/scales/libs/scales.so 496K /usr/local/lib/R/site-library/stringi/libs/stringi.so 124K /usr/local/lib/R/site-library/tibble/libs/tibble.so 164K /usr/local/lib/R/site-library/tidyr/libs/tidyr.so 104K /usr/local/lib/R/site-library/tidyselect/libs/tidyselect.so 344K /usr/local/lib/R/site-library/xml2/libs/xml2.so 6.6M total root@de443801b3fc:/#
Down to six point six megabytes. Not bad for one command. The chart visualizes the respective reductions. Clearly, C++ packages (and their template use) lead to more debugging symbols than plain old C code. But once stripped, the size differences are not that large.
And just to be plain, what we showed previously in post #9 does the same, only already at installation stage. The effects are not cumulative.
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.