Correct Datetime / POSIXct behaviour for R and kdb+
[This article was first published on Thinking inside the box , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We have started to look into kdb+ as a possible
high-performance column-store backend. Kx offers
free trials
— and so I have played with
this for a day or two, both the general system, data loads and dumps and in particular with
the interface to R,
Based on the few files (one C source with interface
code, one R file to access the C code, one object file to link against, one header
file and a simple Makefile), it took just a couple of minutes to turn this
into a proper CRAN-style R
package.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Anyway, the reason for this post was that the R / kdb+ glue code works well … but not for datetimes. I really like to be able to pass date/time objects natively between systems as easily as, say, numbers or strings (and see e.g. my Rcpp package for doing this with R and C++) and I was a bit annoyed when the millisecond timestamps didn’t move smoothly. Turns out that the basic converter function in the code had a number of problems: it converted to integer, only covered a single scalar rather than vectorised mode, and erroneously reduced a reference count. A better version, in my view, is as follows:
This deals with vectors as well as scalars, converts Kdb's 'fractional days since Jan 1, 2000' to the Unix standard of seconds since the epoch -- including the R extension of fractional seconds -- and as importantly, sets the class attributes tostatic SEXP from_datetime_kobject(K x) { SEXP result; int i, length = x->n; if (scalar(x)) { result = PROTECT(allocVector(REALSXP, 1)); REAL(result)[0] = (kF(x)[0] + 10957) * 86400; } else { result = PROTECT(allocVector(REALSXP, length)); for(i = 0; i < length; i++) { REAL(result)[i] = (kF(x)[i] + 10957) * 86400; } } SEXP datetimeclass = PROTECT(allocVector(STRSXP,2)); SET_STRING_ELT(datetimeclass, 0, mkChar("POSIXt")); SET_STRING_ELT(datetimeclass, 1, mkChar("POSIXct")); setAttrib(result, R_ClassSymbol, datetimeclass); UNPROTECT(2); return result; }
POSIXt POSIXct
as needed by R. With
that, a simple select max datetime from table
does just that,
and vectors of timestamped records of trades or quotes or whatever also
come with proper POSIXct
behaviour into R. Note that it needs TZ to be set to UTC, though,
or you get a timezone offset you may not want.
To leave a comment for the author, please follow the link and comment on their blog: Thinking inside the box .
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.