Improving the C++ Code Quality of an Rcpp Package
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Use case
The R package development ecosystem includes packages such as {lintr} and {styler} that can help to check code style, and to format R code.
However, these packages cannot lint or style the C++ code of {Rcpp} packages. This could leave the C++ code of an Rcpp package less clean than the R code, increasing the technical debt already associated with using two languages.
In Epiverse-TRACE, we encounter this issue with {finalsize}, and we anticipate the same issue with further epidemic modelling packages that we seek to develop or adapt, such as {fluEvidenceSynthesis}.
Our use-case is not unique, of course, and other projects could have their own solutions. One such, from which we have borrowed some ideas, is the Apache Arrow project, whose R package also uses a C++ backend (via {cpp11} rather than {Rcpp}).
Choice of C++ linters
C++ linters such as clang-tidy stumble when dealing with C++ code in src/
, as the clang toolchain attempts to compile it. This does not work for Rcpp packages, as the Rcpp.h
header cannot be found — this linking is handled by {Rcpp}.
Fortunately, other C++ linters and code checking tools are available and can be used safely with Rcpp packages.
We have chosen to use cpplint and cppcheck for {finalsize}.
Cpplint
cpplint is a tool that checks whether C/C++ files follow Google’s C++ style guide. cpplint is easy to install across platforms, and does not error when it cannot find Rcpp.h
.
Importantly, cpplint can be instructed to not lint the autogenerated RcppExports.cpp
file, which follows a different style.
To lint all other .cpp
files, we simply run cpplint
from the terminal.
cpplint --exclude="src/RcppExports.cpp" src/*.cpp
Cppcheck
cppcheck is a static code analysis tool, that aims to “have very few false positives”. This is especially useful for the non-standard organisation of Rcpp projects compared to C++ projects.
cppcheck can also be run locally and instructed to ignore the autogenerated RcppExports.cpp
file, while throwing up issues with style.
cppcheck -i src/RcppExports.cpp --enable=style --error-exitcode=1 src
Here, the --enable=style
option lets cppcheck flag issues with style, acting as a second linter. This enables the performance
and portability
flags as well. (We have not found any difference when using --enable=warning
instead.)
Enabling all checks (--enable=all
) would flag two specific issues for {Rcpp} packages: (1) the Rcpp*.h
headers not being found (of the class missingIncludeSystem
), and (2) the solver functions not being used by any other C++ function (unusedFunction
).
These extra options should be avoided in {Rcpp} packages, as the linking is handled for us, and the functions are indeed used later — just not by other C++ functions.
The --error-exitcode=1
argument returns the integer 1
when an error is found, which is by convention the output for an error.
Adding C++ linting to CI workflows
Both cpplint and cppcheck can be easily added to continuous integration workflows. In Epiverse-TRACE, we use Github Actions. The C++ lint workflow we have implemented looks like this:
on: push: paths: "src/**" pull_request: branches: - "*" name: Cpp-lint-check jobs: cpplint: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - uses: actions/setup-python@v2 - run: pip install cpplint - run: cpplint --quiet --exclude="src/RcppExports.cpp" src/*.cpp cppcheck: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - run: sudo apt-get install cppcheck - run: cppcheck -i src/RcppExports.cpp --quiet --enable=warning --error-exitcode=1 .
The workflow is triggered when there are changes to files in src/
, and on all pull requests.
Formatting C++ code
C++ code can be automatically formatted to avoid linter errors. An especially useful tool is clang-format. Our code is styled to follow the Google C++ style guide using:
# replace .cpp with .h to format headers clang-format -i -style=google src/*.cpp
However, this also formats the autogenerated RcppExports.cpp
file. It can be extra work to repeatedly undo this change and keep the original formatting, but clang-format does not provide an easy inline way to ignore this file.
Instead, clang-format can be passed all files except RcppExports.cpp
to style using some simple shell commands. In smaller projects, it might be worth
find src -name "*.cpp" ! -name "RcppExports.cpp" -exec clang-format -style=google -i {} \;
Turning off linting and formatting
There are cases in which we might want to turn linting and formatting off. This might be when the linter does not agree with valid C++ code required in the project, or when the linters and stylers do not agree with each other. These tools are developed separately by large software projects with their own internal requirements, and solutions to issues encountered in their work: clang-format by LLVM (although specifying -style=google
), and cpplint from Google’s work.
Linter-enforced paradigms
Sometimes, the linter or styler developer enforces both a style and the use of certain programming paradigms. An example from cpplint is when it warns against passing function arguments by reference, and prefers for these to be passed as pointers, or as constant references (const int &value
).
int some_function(int &value) { /* operations modifying value */ return value; }
Passing the argument as a const
reference would not serve the needs of this function, and passing by value is a valid strategy when we don’t want to get into the details of using pointers. (Note that this is typically an issue when large objects such as custom classes or structs are passed to a function multiple times.)
Similarly, cpplint will throw a warning about accessing variables using std::move
, which is something we encounter in the Newton solver in {finalsize}. While not technically wrong for such a simple use case, the linter is correct to cautiously throw a warning nonetheless.
Linter-styler disagreement
One example of linter-styler disagreement is the use of BOOST_FOREACH
from the Boost libraries as an alternative to for
loops. clang-format will insist on adding two spaces before the opening bracket: BOOST_FOREACH ()
. cpplint will insist on removing one space.
cpplint and clang-format also disagree on the order of header inclusions, especially when both local and system headers are included.
Disabling checks on code chunks
Either of these cases could require disabling linting or formatting on some part of the code. It is possible to turn off linting using cpplint at particular lines using the comment // NOLINT
. Multiple lines can be protected from linting as well.
// NOLINTBEGIN <some C++ code here> // NOLINTEND
Alternatively, clang-format can be instructed to ignore chunks of code using comment messages too.
// clang-format off <some C++ code here> // clang-format on
Linter options for future packages
{finalsize} is a relatively simple {Rcpp} package, with no C/C++ headers, and no C++ tests. However, future Epiverse-TRACE packages could be more similar to {fluEvidenceSynthesis}, and will have header files, and could also have C++ unit tests via the catch framework.
cpplint will demand that all local headers be prefixed with their directory (src/
), but this would cause the code to break as {Rcpp} looks for a subdirectory called src/src/
. This can be turned off by passing the filter option --filter="-build/include_subdir"
to cpplint. Alternatively, we could place headers in a subdirectory such as inst/include
.
Both cpplint and cppcheck can be instructed to ignore C++ test files using the catch testing framework provided by {testthat}. This prevents errors due to the specialised syntax provided by {testthat} in testthat.h
, such as context
.
# for cpplint, add an extra exclude statement cpplint <...> --exclude="src/test*.cpp" src/*.cpp # for cppcheck, suppress checks on test files cppcheck <...> --suppress=*:src/test_*.cpp src
Conclusion
It is actually somewhat surprising that there does not seem to be a canonical linter for C++ code in {Rcpp} packages. The methods laid out here are an initial implementation developed for use with the {finalsize} package, and the considerations here are a starting point. We shall be continuously evaluating how we ensure the quality of our C++ code as we encounter more use cases while developing future Epiverse-TRACE packages.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.