Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
One out of quite a lot of confusing things about R is that it has two kinds of conditional expression. There’s ifelse()
; and there’s the if
statement. It’s important to know which one to use, as I found when trying to write a conditional expression that chose between lists.
The first thing to appreciate is that if
can be used as a conditional expression as well as a conditional statement. Probably most programmers use it as a statement, like this:
> greet_or_leave <- 'GREET' > if ( greet_or_leave == 'GREET' ) cat('HELLO') else cat('GOODBYE') HELLO>But you can equally well use it as an expression:
greeting <- if ( greet_or_leave == 'GREET' ) 'HELLO' else 'GOODBYE' > greeting [1] "HELLO"
The latter is what I’m interested in in this article. How does it compare with ifelse()
?
For simple uses, they seem to do the same thing:
> ifelse( TRUE, 1, 2 ) [1] 1 > if ( TRUE ) 1 else 2 [1] 1 > ifelse( FALSE, 1, 2 ) [1] 2 > if ( FALSE ) 1 else 2 [1] 2
But this equivalence breaks down when you ask them to return a list rather than a scalar. The ifelse()
returns only the first element of the list. To return it all, you have to use if
:
> ifelse( TRUE, list(a=1,b=2), list(a=1,b=2) ) [[1]] [1] 1 > if ( TRUE ) list(a=1,b=2) else list(a=1,b=2) $a [1] 1 $b [1] 2
This bit me when I was using recode()
from the Tidyverse. This function takes a vector and translates each element by looking it up in a list of name-replacement pairs formed by the following arguments. Thus, if codes
is c( 'a', 'b', 'c' )
, the call
recode( codes, a=1, b=2, c=3 )returns
c(1,2,3)
. I wanted a version of recode
which takes all the replacements in one argument. I implemented it by using !!!
to splice these into the call, as demonstrated under the “Capturing multiple variables” section of “Programming with dplyr”:
recode_with_list <- function( x, other_args ) { recode( x, !!! other_args ) }So the call
recode_with_list( codes, list( a=1, b=2, c=3 ) )also returns
c(1,2,3)
.
I used this when translating data about households in our economic model. Each household has a numeric field indicating its region. We need to convert this to a meaningful string, such as “London”, “Scotland”, or “North East”. That’s easy to do with recode_with_list()
and a translation list mapping codes to region names. But unfortunately, different data sets use different coding conventions, so I needed conditionals to select between translation lists. Initially, I did this with ifelse()
, like this:
translation_list_1 <- list( '1000'='London', '1001'='Scotland', '1002'='North East' ) translation_list_2 <- list( '1'='London', '2'='Scotland', '3'='North East' ) dataset <- tribble( ~id, ~region_codes , 1, 1000 , 2, 1001 , 3, 1000 ) dataset_follows_convention_1 <- TRUE dataset$regions <- recode_with_list( dataset$region_codes , ifelse( dataset_follows_convention_1 , translation_list_1 , translation_list_2 ) )
But I found that recode_with_list()
complained “Unreplaced values treated as NA as .x is not compatible. Please specify replacements exhaustively”. This must have been because the ifelse()
was returning only one list element, and stripping it of its name. After a bit of thought and experimenting, I realised that I could rewrite as:
dataset$regions <- recode_with_list( dataset$region_codes , if ( dataset_follows_convention_1 ) translation_list_1 else translation_list_2 )
This worked, but why didn’t ifelse()
? The documentation says that ifelse(test, yes, no)
returns a value with the same shape as test
which is filled with elements selected from either yes
or no
depending on whether the element of test
is TRUE
or FALSE
. The “same shape” bit is what’s important, because it means that test
determines how many elements ifelse(test, yes, no)
returns from yes
and no
. In my case, dataset_follows_convention_1
had only one element (R scalars are, in reality, single-element vectors), which means that ifelse(test, yes, no)
only returned one element from translation_list_1
and translation_list_2
.
You can see the influence of the “shape” below. As test
becomes longer and longer, so do yes
and no
:
> ifelse( TRUE, translation_list_1, translation_list_2 ) [[1]] [1] "London" > ifelse( c(TRUE,FALSE), translation_list_1, translation_list_2 ) [[1]] [1] "London" [[2]] [1] "Scotland" > ifelse( c(TRUE,FALSE,TRUE), translation_list_1, translation_list_2 ) [[1]] [1] "London" [[2]] [1] "Scotland" [[3]] [1] "North East"
I’m not the only person to have been bitten by this, as “Ryogi”‘s Stack Overflow question “if-else vs ifelse with lists” shows. There are probably other things to beware of too. I notice that the results just above have lost the names from my lists. Moreover, the documentation warns that if(test) yes else no
is much more efficient and often much preferable to ifelse(test, yes, no)
whenever test
has length 1. That’s presumably because ifelse()
will waste a lot of time selecting and discarding elements. Indeed, there’s also a warning that “Sometimes it is better to use a construction such as
(tmp <- yes; tmp[!test] <- no[!test]; tmp), possibly extended to handle missing values in
test
“.
This is not good. The whole point of a high-level language is to provide notations that enable you to express your problem clearly and concisely. It’s the language’s responsibility to compile them into efficient code, not the programmer’s. R designers, please note.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.