Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Rcpp 0.11.1 has introduced flexible subsetting for Rcpp vectors. Subsetting is
implemented for the Rcpp vector types through the [ operator, and intends to
mimic R’s [ operator for most cases.
We diverge from R’s subsetting semantics in a few important ways:
-
For integer and numeric vectors, 0-based indexing is performed, rather than 1-based indexing, for subsets.
-
We throw an error if an index is out of bounds, rather than returning an
NAvalue, -
We require logical subsetting to be with vectors of the same length, thus avoiding bugs that can occur when a logical vector is recycled for a subset operation.
Some examples are showcased below:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector positives(NumericVector x) {
return x[x > 0];
}
// [[Rcpp::export]]
List first_three(List x) {
IntegerVector idx = IntegerVector::create(0, 1, 2);
return x[idx];
}
// [[Rcpp::export]]
List with_names(List x, CharacterVector y) {
return x[y];
}
x <- -5:5
positives(x)
[1] 1 2 3 4 5
l <- as.list(1:10)
first_three(l)
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
l <- setNames(l, letters[1:10])
with_names(l, c("a", "e", "g"))
$a
[1] 1
$e
[1] 5
$g
[1] 7
Most excitingly, the subset mechanism is quite flexible and works well with Rcpp sugar. For example:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
NumericVector in_range(NumericVector x, double low, double high) {
return x[x > low & x < high];
}
// [[Rcpp::export]]
NumericVector no_na(NumericVector x) {
return x[ !is_na(x) ];
}
bool is_character(SEXP x) {
return TYPEOF(x) == STRSXP;
}
// [[Rcpp::export]]
List charvecs(List x) {
return x[ sapply(x, is_character) ];
}
set.seed(123)
x <- rnorm(5)
in_range(x, -1, 1)
[1] -0.56048 -0.23018 0.07051 0.12929
no_na( c(1, 2, NA, 4, NaN, 10) )
[1] 1 2 4 10
l <- list(1, 2, "a", "b", TRUE)
charvecs(l)
[[1]]
[1] "a"
[[2]]
[1] "b"
And, these can be quite fast:
library(microbenchmark)
R_in_range <- function(x, low, high) {
return(x[x > low & x < high])
}
x <- rnorm(1E5)
identical( R_in_range(x, -1, 1), in_range(x, -1, 1) )
[1] TRUE
microbenchmark( times=5,
R_in_range(x, -1, 1),
in_range(x, -1, 1)
)
Unit: milliseconds
expr min lq median uq max neval
R_in_range(x, -1, 1) 8.168 8.556 9.02 9.073 9.223 5
in_range(x, -1, 1) 5.210 5.424 5.48 5.507 6.233 5
R_no_na <- function(x) {
return( x[!is.na(x)] )
}
x[sample(1E5, 1E4)] <- NA
identical(no_na(x), R_no_na(x))
[1] TRUE
microbenchmark( times=5,
R_no_na(x),
no_na(x)
)
Unit: milliseconds
expr min lq median uq max neval
R_no_na(x) 3.958 3.960 4.019 4.02 4.458 5
no_na(x) 1.891 1.936 1.961 2.02 2.755 5
We hope users of Rcpp will find the new subset semantics fast, flexible, and useful throughout their projects.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
