Site icon R-bloggers

#10: Compacting your Shared Libraries, After The Build

[This article was first published on Thinking inside the box , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Welcome to the tenth post in the rarely ranting R recommendations series, or R4 for short. A few days ago we showed how to tell the linker to strip shared libraries. As discussed in the post, there are two options. One can either set up ~/.R/Makevars by passing the strip-debug option to the linker. Alternatively, one can adjust src/Makevars in the package itself with a bit a Makefile magic.

Of course, there is a third way: just run strip --strip-debug over all the shared libraries after the build. As the path is standardized, and the shell does proper globbing, we can just do

$ strip --strip-debug /usr/local/lib/R/site-library/*/libs/*.so

using a double-wildcard to get all packages (in that R package directory) and all their shared libraries. Users on macOS probably want .dylib on the end, users on Windows want another computer as usual (just kidding: use .dll). Either may have to adjust the path which is left as an exercise to the reader.

The impact can be Yuge as illustrated in the following dotplot:

This illustration is in response to a mailing list post. Last week, someone claimed on r-help that tidyverse would not install on Ubuntu 17.04. And this is of course patently false as many of us build and test on Ubuntu and related Linux systems, Travis runs on it, CRAN tests them etc pp. That poor user had somehow messed up their default gcc version. Anyway: I fired up a Docker container, installed r-base-core plus three required -dev packages (for xml2, openssl, and curl) and ran a single install.packages("tidyverse"). In a nutshell, following the launch of Docker for an Ubuntu 17.04 container, it was just

$ apt-get update
$ apt-get install r-base libcurl4-openssl-dev libssl-dev libxml2-dev
$ apt-get install mg          # a tiny editor
$ mg /etc/R/Rprofile.site     # to add a default CRAN repo
$ R -e 'install.packages("tidyverse")'

which not only worked (as expected) but also installed a whopping fifty-one packages (!!) of which twenty-six contain a shared library. A useful little trick is to run du with proper options to total, summarize, and use human units which reveals that these libraries occupy seventy-eight megabytes:

root@de443801b3fc:/# du -csh /usr/local/lib/R/site-library/*/libs/*so
4.3M    /usr/local/lib/R/site-library/Rcpp/libs/Rcpp.so
2.3M    /usr/local/lib/R/site-library/bindrcpp/libs/bindrcpp.so
144K    /usr/local/lib/R/site-library/colorspace/libs/colorspace.so
204K    /usr/local/lib/R/site-library/curl/libs/curl.so
328K    /usr/local/lib/R/site-library/digest/libs/digest.so
33M     /usr/local/lib/R/site-library/dplyr/libs/dplyr.so
36K     /usr/local/lib/R/site-library/glue/libs/glue.so
3.2M    /usr/local/lib/R/site-library/haven/libs/haven.so
272K    /usr/local/lib/R/site-library/jsonlite/libs/jsonlite.so
52K     /usr/local/lib/R/site-library/lazyeval/libs/lazyeval.so
64K     /usr/local/lib/R/site-library/lubridate/libs/lubridate.so
16K     /usr/local/lib/R/site-library/mime/libs/mime.so
124K    /usr/local/lib/R/site-library/mnormt/libs/mnormt.so
372K    /usr/local/lib/R/site-library/openssl/libs/openssl.so
772K    /usr/local/lib/R/site-library/plyr/libs/plyr.so
92K     /usr/local/lib/R/site-library/purrr/libs/purrr.so
13M     /usr/local/lib/R/site-library/readr/libs/readr.so
4.7M    /usr/local/lib/R/site-library/readxl/libs/readxl.so
1.2M    /usr/local/lib/R/site-library/reshape2/libs/reshape2.so
160K    /usr/local/lib/R/site-library/rlang/libs/rlang.so
928K    /usr/local/lib/R/site-library/scales/libs/scales.so
4.9M    /usr/local/lib/R/site-library/stringi/libs/stringi.so
1.3M    /usr/local/lib/R/site-library/tibble/libs/tibble.so
2.0M    /usr/local/lib/R/site-library/tidyr/libs/tidyr.so
1.2M    /usr/local/lib/R/site-library/tidyselect/libs/tidyselect.so
4.7M    /usr/local/lib/R/site-library/xml2/libs/xml2.so
78M     total
root@de443801b3fc:/# 

Looks like dplyr wins this one at thirty-three megabytes just for its shared library.

But with a single stroke of strip we can reduce all this down a lot:

root@de443801b3fc:/# strip --strip-debug /usr/local/lib/R/site-library/*/libs/*so
root@de443801b3fc:/# du -csh /usr/local/lib/R/site-library/*/libs/*so
440K    /usr/local/lib/R/site-library/Rcpp/libs/Rcpp.so
220K    /usr/local/lib/R/site-library/bindrcpp/libs/bindrcpp.so
52K     /usr/local/lib/R/site-library/colorspace/libs/colorspace.so
56K     /usr/local/lib/R/site-library/curl/libs/curl.so
120K    /usr/local/lib/R/site-library/digest/libs/digest.so
2.5M    /usr/local/lib/R/site-library/dplyr/libs/dplyr.so
16K     /usr/local/lib/R/site-library/glue/libs/glue.so
404K    /usr/local/lib/R/site-library/haven/libs/haven.so
76K     /usr/local/lib/R/site-library/jsonlite/libs/jsonlite.so
20K     /usr/local/lib/R/site-library/lazyeval/libs/lazyeval.so
24K     /usr/local/lib/R/site-library/lubridate/libs/lubridate.so
8.0K    /usr/local/lib/R/site-library/mime/libs/mime.so
52K     /usr/local/lib/R/site-library/mnormt/libs/mnormt.so
84K     /usr/local/lib/R/site-library/openssl/libs/openssl.so
76K     /usr/local/lib/R/site-library/plyr/libs/plyr.so
32K     /usr/local/lib/R/site-library/purrr/libs/purrr.so
648K    /usr/local/lib/R/site-library/readr/libs/readr.so
400K    /usr/local/lib/R/site-library/readxl/libs/readxl.so
128K    /usr/local/lib/R/site-library/reshape2/libs/reshape2.so
56K     /usr/local/lib/R/site-library/rlang/libs/rlang.so
100K    /usr/local/lib/R/site-library/scales/libs/scales.so
496K    /usr/local/lib/R/site-library/stringi/libs/stringi.so
124K    /usr/local/lib/R/site-library/tibble/libs/tibble.so
164K    /usr/local/lib/R/site-library/tidyr/libs/tidyr.so
104K    /usr/local/lib/R/site-library/tidyselect/libs/tidyselect.so
344K    /usr/local/lib/R/site-library/xml2/libs/xml2.so
6.6M    total
root@de443801b3fc:/#

Down to six point six megabytes. Not bad for one command. The chart visualizes the respective reductions. Clearly, C++ packages (and their template use) lead to more debugging symbols than plain old C code. But once stripped, the size differences are not that large.

And just to be plain, what we showed previously in post #9 does the same, only already at installation stage. The effects are not cumulative.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box .

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.