subset vectors in Rcpp11
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Under the impulsion of @kevin_ushey
who already did something similar for Rcpp
, we’ve been adding subsetting behavior into Rcpp11
.
The idea is given a vector y
and a vector x
we want to give meaning to y[x]
.
The first legitimate question is what kind of x
do we want to allow. This has been discussed since january. So far, we’ve settled to allow x
to be integer, logical and character vectors. The main source of anxiety here being the typical Cornelian Dilemma
Do we use 0-based or 1-based indices ?
We decided to use 0-based indices, as this is what we do when x
is a scalar int
, and this is C++ : indexing starts at 0
.
rhs use
Given that, here is a first example:
NumericVector y = sqrt( seq_len(10) ) ; IntegerVector x {0,1,2} ; NumericVector res = y[x] ; // [1] 1.000000 1.414214 1.732051
The way we implemented this, y[x]
does not yet return a NumericVector
, that would have been too easy, instead it gives us a lovely sugar expression.
NumericVector y = sqrt( seq_len(10) ) ; IntegerVector x {0,1,2} ; auto res = y[x] ; Rprintf( "type(res) = %s\n", DEMANGLE(decltype(res)) ) ; // type(res) = Rcpp::SubsetProxy<Rcpp::Vector<14, Rcpp::PreserveStorage>, int, Rcpp::Vector<13, Rcpp::PreserveStorage> > return res ; // [1] 1.000000 1.414214 1.732051
This is relevant because we don’t need to materialize the data too early, we can send it to whatever sugar
function:
NumericVector y = sqrt( seq_len(10) ) ; IntegerVector x {0,1,2} ; auto res = sapply( y[x], [](double x){ return x*x; }) ; Rprintf( "type(res) = %s\n", DEMANGLE(decltype(res)) ) ; // type(res) = Rcpp::sugar::Sapply<double, Rcpp::SubsetProxy<Rcpp::Vector<14, Rcpp::PreserveStorage>, int, Rcpp::Vector<13, Rcpp::PreserveStorage> >, test()::$_0> return res ; // [1] 1 2 3
x
may also be a sugar expression, it does not necessarily need to be a materialized vector. For example:
NumericVector y = sqrt( seq_len(10) ) ; auto res = sapply( y[seq(0, 4)], [](double x){ return x*x; }) ; Rprintf( "type(res) = %s\n", DEMANGLE(decltype(res)) ) ; // type(res) = Rcpp::sugar::Sapply<double, Rcpp::SubsetProxy<Rcpp::Vector<14, Rcpp::PreserveStorage>, int, Rcpp::sugar::Seq>, test()::$_0> return res ; // [1] 1 2 3 4 5
And it can be a logical or character expression. For example y[ y < 2.0 ]
...
lhs use
In addition to being a sugar expression, that knows how to apply itself to a vector, the object that is created by y[x]
may also be used on the lhs of the expression.
For example :
NumericVector y = sqrt( seq_len(10) ) ; IntegerVector x {0,1,2} ; y[x] = - y[x] ; return y ; // [1] -1.000000 -1.414214 -1.732051 2.000000 2.236068 2.449490 2.645751 // [8] 2.828427 3.000000 3.162278
And of course, handling sugar :
NumericVector y = sqrt( seq_len(10) ) ; IntegerVector x {0,1,2} ; y[2*x] = - y[x] ; return y ; // [1] -1.000000 1.414214 -1.414214 2.000000 1.414214 2.449490 2.645751 // [8] 2.828427 3.000000 3.162278
Although the feature has been discussed for a few months, it is pretty new so things might change. Actually I came up with a few ideas while writing this post.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.