Coping with varying `gcc` versions and capabilities in R packages
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The problem
I have a package called strex
which is for string manipulation. In this package, I want to take advantage of the regex capabilities of C++11. The reason for this is that in strex
, I find myself needing to do a calculation like
x <- list(c("1,000", "2,000,000"), c("1", "50", "3,455")) lapply(x, function(x) as.numeric(stringr::str_replace_all(x, ",", ""))) #> [[1]] #> [1] 1e+03 2e+06 #> #> [[2]] #> [1] 1 50 3455
A lapply
like this can be done faster in C++11, so I’d like to have that speedup in my package. The problem is, this requires the regex capabilities of C++11, which are only supported in gcc
>= 4.9. Many people are using an older gcc
, e.g. the still popular Ubuntu 14.04 is on gcc
4.8. If these people tried to install the strex
which relied on C++11 regex, they’d get a compile error.
The hope
I wanted to provide the faster option to those with a capable gcc
and the slower lapply
option (which isn’t painfully slow, just a bit slower) to those with an old gcc
. This should all happen inside a seamless install.packages()
call; the user needn’t be bored by all of this.
The solution
Figuring out which gcc
version the user has
The configure
step of package installation needed to do different things depending on the gcc
version. Kevin Ushey’s configure
package (https://github.com/kevinushey/configure) allows you to use R to configure R packages (normally you have to use shell commands). This was a saviour. To get the gcc
version, I used the processx
package (so I had to add it to Imports
in DESCRIPTION
) to execute the shell command gcc -v
.
gcc_version <- function() { out <- tryCatch(processx::run("gcc", "-v", stderr_to_stdout = TRUE), error = function(cnd) list(stdout = "")) out <- stringr::str_match(out$stdout, "gcc version (\\d+(?:\\.\\d+)*)")[1, 2] if (!is.na(out)) out <- numeric_version(out) out }
This returns the gcc
version if gcc
is installed and NA
otherwise. Then, the statement !is.na(gcc_version()) && gcc_version() < "4.9"
returns TRUE
if the user’s gcc
does not support C++11 regex and FALSE
otherwise.
Dealing with the gcc
version
I decided that the default code in the package would be for people with an up to date gcc
and that the configure
step would make alterations to the code for people with an old gcc
. Hence, for people with an old gcc
, configure
needed to remove all of the C++ code that required C++ regex and then replace the body of the R function which .Call()
ed that (now removed) C++ code with R code that performed the same function. It took a long time (many days) and a lot of testing on Travis but this was the right strategy and now strex
is installing beautifully with new and old gcc
s.
The code
There’s a little too much code to walk through the steps in this blog post (and the steps are specific to this package), but if you’re curious as to how this was done, first familiarize yourself with Kevin Ushey’s amazing configure
package and then read the configuration steps in strex
at https://github.com/rorynolan/strex. This includes useful functions like file_replace_R_fun()
to change the body of an R function in a file, file_remove_C_fun()
to remove a C/C++ function from a file and file_remove_matching_lines()
to remove certain lines from a file.
Conclusion
This post is intended to give people an idea of how to deal with this type of problem. If you are struggling with this problem, feel free to contact me; I’m happy to share the limited knowledge that I have.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.