Using R — Packaging a C library in 15 minutes
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Yes, this post condenses 50+ hours of learning into a 15 minute tutorial. Read ‘em and weep. (That is, you read while I weep.)
OK. For the last week I’ve been learning how to call C code as documented in previous posts: Calling C code “Hello World!”, .Call(“hello”) and Calling C code with Rcpp. Now comes that task that puts this hard won knowledge to use — providing an R interface to a library of C code as part of a package. You are encouraged to have Writing R Extensions open in another window and use search whenever you see something you want more information about. In an effort to keep this exercise to 15 minutes we will stick to the task at hand and not do a whole lot of explaining. It is assumed that you already have the required compilers installed.
1) Download and test your C library
We’ll be working with the “libmseed” library that reads binary seismic data. Download the .gz file from the IRIS DMC Softare Library and then compile and test it with the example code. (Read the README files on your own time. Right now we have work to do.)
$ cd ~/Downloads $ wget http://www.iris.edu/pub/programs/libmseed-2.7.tar.gz --2012-11-16 16:18:11-- http://www.iris.edu/pub/programs/libmseed-2.7.tar.gz Resolving www.iris.edu... 128.95.166.129 Connecting to www.iris.edu|128.95.166.129|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 166186 (162K) [application/x-gzip] Saving to: “libmseed-2.7.tar.gz” ... $ tar -xzf libmseed-2.7.tar.gz $ cd libmseed $ make cc -c -o fileutils.o fileutils.c cc -c -o genutils.o genutils.c cc -c -o gswap.o gswap.c cc -c -o lmplatform.o lmplatform.c cc -c -o lookup.o lookup.c cc -c -o msrutils.o msrutils.c cc -c -o pack.o pack.c cc -c -o packdata.o packdata.c cc -c -o traceutils.o traceutils.c cc -c -o tracelist.o tracelist.c cc -c -o parseutils.o parseutils.c cc -c -o unpack.o unpack.c cc -c -o unpackdata.o unpackdata.c cc -c -o selection.o selection.c cc -c -o logging.o logging.c rm -f libmseed.a ar -csq libmseed.a fileutils.o genutils.o gswap.o lmplatform.o lookup.o msrutils.o pack.o packdata.o traceutils.o tracelist.o parseutils.o unpack.o unpackdata.o selection.o logging.o $ cd example $ make cc -I.. -c msview.c cc -I.. -o msview msview.o -L.. -lmseed cc -I.. -c msrepack.c cc -I.. -o msrepack msrepack.o -L.. -lmseed $ ./msview test.mseed IU_COLA_00_LHZ, 000001, M, 512, 112 samples, 1 Hz, 2010,058,06:50:00.069539 ... IU_COLA_00_LHZ, 000001, M, 512, 27 samples, 1 Hz, 2010,058,07:59:33.069538 $
[02:00 minutes]
Congratulations are in order. (No, not for you! For the smart folks who wrote libmseed!) 😉
2) Set up your package structure
Yes, we’re going to do that now. You could take incremental steps and practice writing an R callable version of example/msview.c and then try to compile it with R CMD SHLIB, then give up and write a Makefile that contains all of the R CMD SHLIB flags. But I’m here to tell you that most of that will happen auto-magically if you set up a package first and let R worry about the details.
Here is the minimal package structure you need. We’ll create a new directory named “mseed” and copy all of the “libmseed” code into it:
$ cd ~ $ mkdir mseed $ cd mseed $ mkdir demo $ mkdir inst $ mkdir inst/extdata $ mkdir R $ mkdir src $ cp -r ~/Downloads/libmseed ./src
[02:40]
(Curiosity might lead a person to search through your already open copy of “Writing R Extensions” to figure out what “extdata” is used for. In the interest of time, please do that later.)
3) Add Makevars to your mseed/src/ directory
You need to educate R that you are going to compile code that depends upon the libmseed.a library. The easiest way to do this is by creating a file named mseed/src/Makevars::
# See Section 1.2.1 "Using 'Makevars'" of Writing R Extensions # cran.r-project.org/doc/manuals/R-exts.pdf PKG_CFLAGS= PKG_CPPFLAGS=-Ilibmseed PKG_LIBS=-Llibmseed -lmseed $(SHLIB): libmseed/libmseed.a libmseed/libmseed.a: @(cd libmseed && $(MAKE) static CC="$(CC)" CFLAGS="$(CFLAGS)")
[03:20]
4) Convert example C code to R callable routine.
This step assumes that you already know everything learned in the .Call(“hello”) post. I took the code in msview.c and whittled it down as much as possible before converting it into an R callable function that spits out, once again, “Hello World!”. Here’s what I ended up with. Copy this code to mseed/src/test_hello_mseed.c
/*************************************************************************** R callable function ***************************************************************************/ #include <R.h> #include <Rdefines.h> #include <libmseed.h> SEXP test_hello_mseed () { SEXP result; static flag verbose = 0; static int reclen = -1; MSRecord *msr = 0; int dataflag = 0; int64_t totalrecs = 0; int64_t totalsamps = 0; int retcode; char *inputfile = "mseed/src/libmseed/example/test.mseed"; /* Loop over the input file */ while ( (retcode = ms_readmsr (&msr, inputfile, reclen, NULL, NULL, 1, dataflag, verbose)) == MS_NOERROR ) { totalrecs++; totalsamps += msr->samplecnt; printf ("Hello World!\n"); } if ( retcode != MS_ENDOFFILE ) ms_log (2, "Cannot read %s: %s\n", inputfile, ms_errorstr(retcode)); /* Make sure everything is cleaned up */ ms_readmsr (&msr, NULL, 0, NULL, NULL, 0, 0, 0); PROTECT(result = NEW_INTEGER(2)); INTEGER(result)[0] = (int) totalrecs; INTEGER(result)[1] = (int) totalsamps; UNPROTECT(1); return(result); }
[04:30]
You may have noticed that we hardcoded the location of our data file. Not really following best practices here but we’re just trying to do the minimum to get it all to work. We just need to make sure we’re in the directory above mseed/ when we invoke this function.
5) Add wrapper code, NAMESPACE and DESCRIPTION
As described in .Call(“hello”) we need to add a wrapper function for our C code. Create mseed/R/mseedWrappers.R with the following content:
# wrapper function to invoke test_hello_mseed test_hello_mseed <- function() { result <- .Call("test_hello_mseed") return(result) }
Now we need to educate R about the namespace associated with our package. Create mseed/NAMESPACE with the following:
# Import required packages # none so far # Load dynamic libraries (shared object files) useDynLib(mseed) # Export functions defined in R code export("test_hello_mseed")
When R compiles everything, the test_hello_mseed() function written in C will be part of the mseed shared object library.
The last step needed for package compilation is the mseed/DESCRIPTION file:
Package: mseed Type: Package Title: Test wrapper for libseed Version: 0.0-0 Date: 2012-12-12 Author: Jonathan Callahan <[email protected]> Maintainer: Jonathan Callahan <[email protected]> Description: A package that demonstrates how to wrap a C library using the seismic data Mini-SEED library as an example (http://www.iris.edu/software/libraries/). License: GPL (>= 2) Depends: R (>= 2.14) Collate: mseedWrappers.R
[06:50]
6) Wave your wand and watch the magic!
Assuming you have everything in place and have copy-pasted well the following should work if you position yourself above the mseed/ directory. (Always start with R CMD REMOVE mseed to get rid of any previous version.)
$ R CMD REMOVE mseed Removing from library ‘/usr/lib/R/library’ Updating HTML index of packages in '.Library' Making packages.html ... done $ R CMD build mseed * checking for file ‘mseed/DESCRIPTION’ ... OK * preparing ‘mseed’: * checking DESCRIPTION meta-information ... OK * cleaning src * checking for LF line-endings in source and make files * checking for empty or unneeded directories * building ‘mseed_0.0-0.tar.gz’ $ R CMD INSTALL mseed * installing to library ‘/usr/lib/R/library’ * installing *source* package ‘mseed’ ... ** libs gcc -m32 -std=gnu99 -I/usr/include/R -Ilibmseed -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -c test_hello_mseed.c -o test_hello_mseed.o make[1]: Entering directory `/home/jonathan/mseed/src/libmseed' gcc -m32 -std=gnu99 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m32 -march=i686 -mtune=atom -fasynchronous-unwind-tables -c -o fileutils.o fileutils.c ... ** building package indices ... ** testing if installed package can be loaded * DONE (mseed) Making packages.html ... done $ R --vanilla ... > library(mseed) > test_hello_mseed() Hello World! ... Hello World! [1] 36 4200 >
[08:20]
It’s a miracle!!!
Congratulations are in order. Go have a coffee or a beer or whatever to kill the rest of your 15 minutes … unless …
Unless, that is, you want to tackle actually communicating data between R and this C library. That will take a few more minutes
7) Start with a C code example that reads data from a buffer
Presumably, we’re packaging our library so that R can pass objects to C code and get objects back. That means that our C code better know how to read data from a buffer. We probably want to start out with a C code example that does this. In the real world you will either have to write the buffer example yourself or ask a friend, preferably the author of the package, to help out. I started with msviewmemory.c. Better make sure it works:
$ cd mseed/src/libmseed/example $ wget http://mazamascience.com/Published/msviewmemory.c --2012-11-16 16:42:11-- http://mazamascience.com/Published/msviewmemory.c ... $ make msviewmemory cc -I.. -L.. msviewmemory.c -lmseed -o msviewmemory $ ./msviewmemory test.mseed IU_COLA_00_LHZ, 000001, M, 512, 112 samples, 1 Hz, 2010,058,06:50:00.069539 ...
[09:30]
So far so good.
8) Whittle this code down to an R callable routine
You can try and figure out what the code below does later. We’re at risk of going over time so go back to the directory above mseed and just copy and paste the following code into mseed/src/test_hello_mseed_mem.c:
/*************************************************************************** R callable function to pass raw data to a C test function ***************************************************************************/ #include <R.h> #include <Rdefines.h> #include <stdio.h> #include <sys/stat.h> #include <libmseed.h> SEXP test_hello_mseed_mem (SEXP buffer) { int bufferLength; char *bufferPtr; SEXP result; PROTECT(buffer = AS_RAW(buffer)); bufferPtr = RAW_POINTER(buffer); bufferLength = LENGTH(buffer); for (int i=0; i<10; i++) { Rprintf("buf[%d]=0x%02x\n",i,bufferPtr[i]); } Rprintf("bufferLength = %d\n",bufferLength); long long int bsize = (long long int) bufferLength; long long int boffset = 0; static flag verbose = 0; static int reclen = -1; MSRecord *msr = 0; int dataflag = 0; int64_t totalrecs = 0; int64_t totalsamps = 0; /* Loop over the buffer */ boffset = 0; while ( boffset < bsize ) { if ( msr_parse (bufferPtr+boffset, bsize-boffset, &msr, reclen, dataflag, verbose) ) { if ( verbose ) ms_log (2, "Error parsing record at offset %lld\n", boffset); boffset += 256; } else { totalrecs++; totalsamps += msr->samplecnt; printf ("Hello World!\n"); boffset += msr->reclen; } } /* Make sure everything is cleaned up */ ms_readmsr (&msr, NULL, 0, NULL, NULL, 0, 0, 0); PROTECT(result = NEW_INTEGER(3)); INTEGER(result)[0] = (int) bufferLength; INTEGER(result)[1] = (int) totalrecs; INTEGER(result)[2] = (int) totalsamps; UNPROTECT(2); return(result); }
[11:10]
9) Add a wrapper, external data and a demo
Add this to mseed/R/mseedWrappers.R:
# wrapper function to invoke test_hello_mseed_mem test_hello_mseed_mem <- function(buffer) { result <- .Call("test_hello_mseed_mem",buffer) return(result) }
Of course, we’ll also have to add one more line specifying this new function to mseed/NAMESPACE:
export("test_hello_mseed_mem")
Make a copy of test.mseed so that it is found in the normal place for package “external data”:
$ cp mseed/src/libmseed/example/test.mseed ./mseed/inst/extdata/
Now add the following demonstration script as mseed/demo/hello.R:
# Demonstrate successful processing of a miniseed file library(mseed) # Read in a mseed file as raw bytes mseedFile <- system.file("extdata", "test.mseed", package="mseed") rawData <- readBin(mseedFile, raw(), n=1e+6) # How many bites do we have? length(rawData) # Let's look at the first 10 rawData[1:10] # Now pass the data to test_hello_mseed_mem() test_hello_mseed_mem(rawData)
and a file named mseed/demo/00Index:
hello read mseed file and pass it to a C test function
[12:55]
10) Racing to the finish line
$ R CMD REMOVE mseed ... $ R CMD build mseed ... $ R CMD INSTALL mseed_0.0-0.tar.gz ... ]$ R --vanilla ... > library(mseed) > demo(package="mseed") > demo(hello) ... Hello World! [1] 18432 36 4200 >
[Smack that timer!] Done.
So, how is your time? Did you keep up the pace and finish in under 15 minutes? Or did you get distracted trying to understand what you were doing and go over time.
You should feel good about your accomplishment in either case if you got this far. (I certainly did!) Compiling a C library for inclusion with R is not for the faint of heart. I hope that this baby-steps example makes it seem like it’s at least possible if perhaps not easy.
Best of luck extending R!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.