if … else and ifelse
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Let’s make this a quick and quite basic one. There is this incredibly useful function in R called ifelse()
. It’s basically a vectorized version of an if … else control structure every programming language has in one way or the other. ifelse()
has, in my view, two major advantages over if … else:
- It’s super fast.
- It’s more convenient to use.
The basic idea is that you have a vector of values and whenever you want to test these values against some kind of condition, you want to have a specific value in another vector. An example follows below. First, let’s load the {rbenchmark}
package to see the speed benefits.
library(rbenchmark)
Now, the toy example: I am creating a vector of half a million random normally distributed values. For each of these values, I want to know whether the value is below or above zero.
x <- rnorm(500000)
ifelse()
is used as ifelse(<TEST>, <OUTCOME IF TRUE>, <OUTCOME IF FALSE>)
, so we need three arguments. My test is x < 0
and I want to have the string "negative"
in y
whenever the corresponding value in x
is smaller than zero. If this is not the case, then y
should have a "positive"
in this position. ifelse()
only needs one line of code for this.
benchmark(replications = 50, { y <- ifelse(x < 0, "negative", "positive") })$user.self ## [1] 5.88
We could also solve this with a for
loop. But, as you can see, this takes approx. 3 times as long.
benchmark(replications = 50, { y <- c() for (i in x) { if (i < 0) { y[length(y)+1] <- "negative" } else { y[length(y)+1] <- "negative" } } })$user.self ## [1] 16.938
The same is true for an sapply()
version. sapply()
even consistently takes a little longer than a for
loop in this case - to my surprise.
benchmark(replications = 50, { y <- sapply(x, USE.NAMES = F, FUN = function (i) { if (i < 0) { "negative" } else { "positive" } } ) })$user.self ## [1] 20.423
It’s highly unlikely that rnorm()
produces a value of exactly zero. But we could also check for this by simply nesting calls to ifelse()
. If you want to do this, you simply add another ifelse()
in the “FALSE” part of the previous ifelse()
as I did below. In this little toy example, this nested test is still considerably faster than the for
or sapply()
versions of the single test.
benchmark(replications = 50, { y <- ifelse(x < 0, "negative", ifelse(x > 0, "positive", "exactly zero")) })$user.self ## [1] 12.197
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.