Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In this post, I will talk about the ifelse function, which behaviour can be easily misunderstood, as pointed out in my latest question on SO. I will try to show how it can be used, and misued. We will also check if it is as fast as we could expect from a vectorized base function of R.
How can it be used?
The first example comes directly from the R documentation:
x <- c(6:-4) sqrt(x) #- gives warning ## Warning in sqrt(x): NaNs produced ## [1] 2.449490 2.236068 2.000000 1.732051 1.414214 1.000000 0.000000 NaN NaN ## [10] NaN NaN sqrt(ifelse(x >= 0, x, NA)) # no warning ## [1] 2.449490 2.236068 2.000000 1.732051 1.414214 1.000000 0.000000 NA NA ## [10] NA NA
So, it can be used, for instance, to handle special cases, in a vectorized, succinct way.
The second example comes from the vignette of Rcpp Sugar:
foo <- function(x, y) { ifelse(x < y, x*x, -(y*y)) } foo(1:5, 5:1) ## [1] 1 4 -9 -4 -1
So, it can be used to construct a vector, by doing an element-wise comparison of two vectors, and specifying a custom output for each comparison.
A last example, just for the pleasure:
(a <- matrix(1:9, 3, 3)) ## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9 ifelse(a %% 2 == 0, a, 0) ## [,1] [,2] [,3] ## [1,] 0 4 0 ## [2,] 2 0 8 ## [3,] 0 6 0
How can it be misused?
I think many people think they can use ifelse
as a shorter way of writing an if-then-else
statement (this is a mistake I made). For example, I use:
legend.pos <- ifelse(is.top, ifelse(is.right, "topright", "topleft"), ifelse(is.right, "bottomright", "bottomleft"))
instead of:
if (is.top) { if (is.right) { legend.pos <- "topright" } else { legend.pos <- "topleft" } } else { if (is.right) { legend.pos <- "bottomright" } else { legend.pos <- "bottomleft" } }
That works, but this doesn’t:
ifelse(FALSE, 0, 1:5) ## [1] 1
Indeed, if you read carefully the R documentation, you see that ifelse
is returning a vector of the same length and attributes as the condition (here, of length 1).
If you really want to use a more succinct notation, you could use
`if`(FALSE, 0, 1:5) ## [1] 1 2 3 4 5
If you’re not familiar with this notation, I suggest you read the chapter about functions in book Advanced R.
Benchmarks
Reimplementing ‘abs’
abs2 <- function(x) { ifelse(x < 0, -x, x) } abs2(-5:5) ## [1] 5 4 3 2 1 0 1 2 3 4 5 library(microbenchmark) x <- rnorm(1e4) print(microbenchmark( abs(x), abs2(x) )) ## Unit: microseconds ## expr min lq mean median uq max neval ## abs(x) 3.973 5.2975 36.19779 6.9530 9.271 1613.386 100 ## abs2(x) 496.299 523.9450 1595.51016 549.7695 634.859 80076.957 100
Comparing with C++
Consider the Rcpp Sugar example again, 4 means to compute it:
#include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] NumericVector fooRcpp(const NumericVector& x, const NumericVector& y) { int n = x.size(); NumericVector res(n); double x_, y_; for (int i = 0; i < n; i++) { x_ = x[i]; y_ = y[i]; if (x_ < y_) { res[i] = x_*x_; } else { res[i] = -(y_*y_); } } return res; } fooRcpp(1:5, 5:1) ## [1] 1 4 -9 -4 -1 #include <Rcpp.h> using namespace Rcpp; // [[Rcpp::export]] NumericVector fooRcppSugar(const NumericVector& x, const NumericVector& y) { return ifelse(x < y, x*x, -(y*y)); } fooRcppSugar(1:5, 5:1) ## [1] 1 4 -9 -4 -1 foo2 <- function(x, y) { cond <- (x < y) cond * x^2 - (1 - cond) * y^2 } foo2(1:5, 5:1) ## [1] 1 4 -9 -4 -1 x <- rnorm(1e4) y <- rnorm(1e4) print(microbenchmark( foo(x, y), foo2(x, y), fooRcpp(x, y), fooRcppSugar(x, y) )) ## Unit: microseconds ## expr min lq mean median uq max neval ## foo(x, y) 510.535 542.6510 872.23474 563.510 716.9680 2439.447 100 ## foo2(x, y) 71.183 75.1560 147.17468 83.765 93.8635 1977.250 100 ## fooRcpp(x, y) 40.393 44.6970 63.59186 47.676 51.1535 1468.038 100 ## fooRcppSugar(x, y) 138.394 141.3745 179.16429 142.533 161.4045 1575.972 100
Even if it is a vectorized base R function, ifelse
is known to be slow.
Conclusion
Beware when you use the ifelse
function. Moreover, if you make a substantial number of calls to it, be aware that it isn’t very fast, but it exists at least 3 faster alternatives to it.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.