Vector Subsetting in Rcpp
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Rcpp 0.11.1 has introduced flexible subsetting for Rcpp vectors. Subsetting is
implemented for the Rcpp vector types through the [
operator, and intends to
mimic R’s [
operator for most cases.
We diverge from R’s subsetting semantics in a few important ways:
-
For integer and numeric vectors, 0-based indexing is performed, rather than 1-based indexing, for subsets.
-
We throw an error if an index is out of bounds, rather than returning an
NA
value, -
We require logical subsetting to be with vectors of the same length, thus avoiding bugs that can occur when a logical vector is recycled for a subset operation.
Some examples are showcased below:
#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] NumericVector positives(NumericVector x) { return x[x > 0]; } // [[Rcpp::export]] List first_three(List x) { IntegerVector idx = IntegerVector::create(0, 1, 2); return x[idx]; } // [[Rcpp::export]] List with_names(List x, CharacterVector y) { return x[y]; } x <- -5:5 positives(x) [1] 1 2 3 4 5 l <- as.list(1:10) first_three(l) [[1]] [1] 1 [[2]] [1] 2 [[3]] [1] 3 l <- setNames(l, letters[1:10]) with_names(l, c("a", "e", "g")) $a [1] 1 $e [1] 5 $g [1] 7
Most excitingly, the subset mechanism is quite flexible and works well with Rcpp sugar. For example:
#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] NumericVector in_range(NumericVector x, double low, double high) { return x[x > low & x < high]; } // [[Rcpp::export]] NumericVector no_na(NumericVector x) { return x[ !is_na(x) ]; } bool is_character(SEXP x) { return TYPEOF(x) == STRSXP; } // [[Rcpp::export]] List charvecs(List x) { return x[ sapply(x, is_character) ]; } set.seed(123) x <- rnorm(5) in_range(x, -1, 1) [1] -0.56048 -0.23018 0.07051 0.12929 no_na( c(1, 2, NA, 4, NaN, 10) ) [1] 1 2 4 10 l <- list(1, 2, "a", "b", TRUE) charvecs(l) [[1]] [1] "a" [[2]] [1] "b"
And, these can be quite fast:
library(microbenchmark) R_in_range <- function(x, low, high) { return(x[x > low & x < high]) } x <- rnorm(1E5) identical( R_in_range(x, -1, 1), in_range(x, -1, 1) ) [1] TRUE microbenchmark( times=5, R_in_range(x, -1, 1), in_range(x, -1, 1) ) Unit: milliseconds expr min lq median uq max neval R_in_range(x, -1, 1) 8.168 8.556 9.02 9.073 9.223 5 in_range(x, -1, 1) 5.210 5.424 5.48 5.507 6.233 5 R_no_na <- function(x) { return( x[!is.na(x)] ) } x[sample(1E5, 1E4)] <- NA identical(no_na(x), R_no_na(x)) [1] TRUE microbenchmark( times=5, R_no_na(x), no_na(x) ) Unit: milliseconds expr min lq median uq max neval R_no_na(x) 3.958 3.960 4.019 4.02 4.458 5 no_na(x) 1.891 1.936 1.961 2.02 2.755 5
We hope users of Rcpp will find the new subset semantics fast, flexible, and useful throughout their projects.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.