Site icon R-bloggers

Access attribute_hidden Functions in R Packages

[This article was first published on BioStatMatt, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Maybe the title should have been prepended with “Don’t…”

The source code of R is littered with “attribute_hidden” declarations. These declarations attempt to ensure that the variable or function may only be accessed by code in the core R distribution, and not by R extension packages. Generally there is a good reason for this. For example, there is no good reason why R extension packages should call do_scan, the C level wrapper for the R function scan. If package code needs to use scan, it should call it with R code.

Package developers should consult with the R development community when they want to access attribute_hidden functions, there may be alternatives. Also, It would be much more elegant and useful to convince an R core developer to simply modify the attribute_hidden declaration or open up access to an API. In the meantime, we may want to use other methods to access attribute_hidden functions at the C level. For instance, not every attribute_hidden function or variable has an R level counterpart. Another reason to access attribute_hidden symbols is to partially expose the R connections API, which is currently not accessible to R extension packages. However, comments in the R source code hint that the connections API may be opened to R packages at some point in the future ( hopefully ).

Ideally, the R source code or R CMD check mechanism would be modified to prevent the sort of trickery presented below. I hope this post will contribute to the tireless efforts of the R core development team in improving R.

The rest of this post is dedicated to describing how attribute_hidden symbols may be accessed on the Linux/BSD/Mac OS X? platforms in a fairly portable manner. It may also be possible to extend this approach to Windows. Those who just want to see a package that demonstrates the method, see example_1.0.tar.gz

DISCLAIMER: It is possible to use this method with packages that pass R CMD check. However, accessing attribute_hidden symbols circumvents an R safety mechanism and should not be used in production packages.

The attribute_hidden declaration is defined in multiple places in the R source code, including src/include/Defn.h:

#define attribute_hidden __attribute__ ((visibility ("hidden")))

This declaration is a compiler extension that affects how the symbol may be accessed. From the gcc documentation, the “hidden” attribute has the following meaning:

Hidden visibility indicates that the symbol will not be
placed into the dynamic symbol table, so no other “module”
(executable or shared library) can reference it directly.
….
Note that hidden symbols, while they cannot be referenced
directly by other modules, can be referenced indirectly via
function pointers

Clearly, attribute_hidden functions may be accessed using function pointers, which requires knowing the address where the function is loaded in memory when R is executed. In general, this information is not accessible to R packages. However, this information is contained in the R executable file. The trick is to extract the address of the function we want to use, and then construct a function pointer to use it.

Suppose we want to use the attribute_hidden function getConnections_no_err, defined in the file src/main/connections.c:

attribute_hidden Rconnection getConnection_no_err(int n);

The first step is to find out the address of this function in memory when R is executed. As I mentioned this information is contained in the R executable file, usually located at $(R_HOME)/bin/exec/R. The objdump program in the binutils package may be used to extract such information:

$ objdump -t `R RHOME`/bin/exec/R | grep getConnection_no_err
0000000000509eb0 l     F .text	0000000000000024              .hidden getConnection_no_err

In this example, objdump outputs several pieces of information associated with the function getConnection_no_err, the first of which is the (hex) address where the function will be loaded in memory when R is executed. We can isolate this bit of information with an additional command:

$ objdump -t `R RHOME`/bin/exec/R | grep getConnection_no_err | awk '{print $1}'
0000000000509eb0

The next step is to construct a function pointer that that we can assign this address, and thereby call the function. In our package source code, we would use the following declarations

#include <Rinternals.h>
...
typedef Rconnection (* FUNP)(int);
FUNP getConnection_no_err = (FUNP) 0x0000000000509eb0;
...

and then we could call the function with a statement like

...
Rconnection con = (*getConnection_no_err)(0);
...

However, in this particular case, we wouldn’t be able to do much with the Rconnection pointer, other than pass it to another function. In order to get any useful information about the Rconnection, we would first need to copy the struct Rconn declaration from the file src/include/Rconnections.h file to our package source code. Of course, this is not considered “good” programming practice. However, Rconnections.h is not a public header. Until this header is made public, there would be little alternative.

This is the gist of how attribute_hidden symbols may be accessed in package code. There are various tricks that may be utilized to automate collecting the symbol address from the R executable. For those interested, I have prepared a small R extension package example_1.0.tar.gz containing a single function get_mode that is passed a connection description and returns the mode (e.g."rw"), by accessing the internal Rconnection pointer. For example:

> library(example)
> get_mode("stdin")
[1] "r"

Pay special attention to the configure.ac file. This is where most of the work of finding the address for getConnection_no_err occurs. Also, note that this package passes R CMD check under R-2.11.0.

The trick of accessing attribute_hidden functions is not a “robust” method, as we might say in statistics. That is, it is easily broken. If objdump, grep, awk are not installed, the installation will fail. Also, if an incorrect address is found, installation may fail, or get_mode may result in a segmentation fault, or may produce other odd behavior. Please leave feedback if you try it out.

To leave a comment for the author, please follow the link and comment on their blog: BioStatMatt.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.