Think of `&&` as a stricter `&`
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In programming languages, we find logical operators for and
and or. In fact, Python uses the actual words and
and or
for these operators.
# Python via the reticulate package x = True y = False x and y #> False x or y #> True
In Javascript, we see &&
for and and ||
for or instead.
// Javascript via `engine = "node"` in knitr let x = true; let y = false; console.log(x && y); console.log(x || y); // false // true
In R, we have two versions of logical and (&
and &&
) and logical
or (|
and ||
). What’s going on?
Documentation in help("Logic",
package = "base")
provides the
following:
&
and&&
indicate logical AND and|
and||
indicate logical OR. The shorter form performs elementwise comparisons in much the same way as arithmetic operators. The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined. The longer form is appropriate for programming control-flow and typically preferred inif
clauses.
Let’s unpack this paragraph
The shorter operators are vectorized
The crucial difference is that the shorter versions (&
, |
) are
vectorized. Given two vectors, they will apply logical and/or on
pairs of elements from each vector. In the example below, ttff[1]
is
and-ed with tftf[1]
, ttff[2]
is and-ed with tftf[2]
, and so
on. This vectorization is a pretty important feature, for example, when
we are comparing columns in a dataframe in order to filter rows.
# Returns something of length four ttff <- c(TRUE, TRUE, FALSE, FALSE) tftf <- c(TRUE, FALSE, TRUE, FALSE) ttff & tftf #> [1] TRUE FALSE FALSE FALSE ttff | tftf #> [1] TRUE TRUE TRUE FALSE
In contrast, &&
and ||
only work on scalars (length-one values).
They return just one element. In this example, they look at ttff[1]
and tftf[1]
.
# Returns something of length one ttff && tftf #> [1] TRUE ttff || tftf #> [1] TRUE
To help remember the distinction, think of the longer versions (&&
,
||
) as stricter forms of the logical operators. They don’t just care
about truthiness or falsiness, but they also care about length. The
extra and/or characters are there because the operators are
extra, if you will.
Okay, that was the point of this post: to describe the difference between the short and long operators and introduce the intuition that the longer forms are stricter. The rest of this post will dig into some other oddities and notes about and and or.
Short circuit evaluation
You may have noticed a strange or unclear detail in the documentation.
The longer form evaluates left to right examining only the first element of each vector. Evaluation proceeds only until the result is determined.
This part is describing the semantics of short-circuit evaluation. Here are two facts about and and or:
- x and y is false when either x or y is false. Therefore, if x is false, we don’t need to look at y at all.
- x or y is true when either x or y is true. Therefore, if x is true, we don’t need to look at y at all.
R will not evaluate the second operand for &&
and ||
if it can learn
the answer from the first operand. Thus, short-circuit evaluation will
ignore the stop()
calls in the examples below.
FALSE && stop("this is an error") #> [1] FALSE TRUE || stop("this is an error") #> [1] TRUE # Short-circuiting doesn't apply. Need the second operand. TRUE && stop("this is an error") #> Error in eval(expr, envir, enclos): this is an error FALSE || stop("this is an error") #> Error in eval(expr, envir, enclos): this is an error
The NULL-or-default pattern
Logically, the short-circuit evaluation for ||
is equivalent to a
particular kind of if statement:
if_x_then_x_else_y <- function(x, y) { if (x) x else y } if_x_then_x_else_y(TRUE, FALSE) #> [1] TRUE # short-circuited if_x_then_x_else_y(TRUE, stop("this is an error")) #> [1] TRUE
In languages that treat undefined values as falsy and defined values
as truthy, this if (x) x else y
behavior is sometimes used as an idiom
to set a default, backup value for a variable. In the Javascript code below,
the undefined variable name
is treated as falsy, the
string "I don't know your name"
is treated as truthy, so the or
returns the second string. In other words, “the first truthy value is
returned”.
let name; let fallback = name || "I don't know your name!"; console.log(name); console.log(fallback); // undefined // I don't know your name!
(This pattern, incidentally, appears to have earned its own operator in
Javascript with the “nullish coalescing operator”
??
.)
Why do I mention this programming idiom from Javascript? Because setting
a default for missing values is pretty useful, and this syntax is pretty
nice. The tidyverse provides a nullish coalescing
operator,
inspired by Ruby’s ||
operator.
library(purrr) 1 %||% 2 #> [1] 1 NULL %||% 2 #> [1] 2 # Not exactly or-like. It just cares about NULL-ness. FALSE %||% 2 #> [1] FALSE # See the source code `%||%` #> function (x, y) #> { #> if (is_null(x)) #> y #> else x #> } #> <bytecode: 0x000000001f08c228> #> <environment: namespace:rlang>
if()
statements want the stricter operators
Recall the following from the documentation:
The longer form is appropriate for programming control-flow and typically preferred in
if
clauses.
if()
statements are not vectorized. (See
ifelse()
instead.) if()
statements complain when they see a vector:
if (ttff | tftf) { "We are going to get a warning." } #> Warning in if (ttff | tftf) {: the condition has length > 1 and only the first #> element will be used #> [1] "We are going to get a warning."
The idea behind the documentation is that because &
and |
return
vectors and because if()
only likes scalars, we should not use the
shorter forms in if()
statements. They provide the wrong output for
if()
. But that does not mean the following code with the stricter
or operator is correct.
if (ttff || tftf) { "No warning but this code is not right." } #> [1] "No warning but this code is not right."
Although the code here does not raise any warnings, it reduces all of
the information in ttff
and tftf
into just ttff[1]
and
tftf[1]
. Those values likely will not be appropriate for the
programming task at hand, so we should provide the scalars ourselves.
all()
and any()
can apply and and or down a vector
Because I am talking about and and or and about creating logical
scalars, I want to advertise two particular functions that can reduce a
logical vector into a scalar. all()
is TRUE
when all of the elements are TRUE
. For a vector, it would be
like replacing the commas in c(TRUE, TRUE, FALSE)
with &
s.
any()
provides the analogous
down-vector behavior for |
. any()
is TRUE
when any of the elements
in the vector are TRUE
(and not NA
—more on that later).
all(c(TRUE, TRUE, TRUE)) #> [1] TRUE c(TRUE & TRUE & TRUE) #> [1] TRUE all(c(TRUE, FALSE, TRUE)) #> [1] FALSE c(TRUE & FALSE & TRUE) #> [1] FALSE c(TRUE | FALSE | TRUE) #> [1] TRUE # The input can be scalars or vectors all(TRUE, TRUE, TRUE, c(TRUE, FALSE)) #> [1] FALSE any(FALSE, FALSE, FALSE, c(FALSE, TRUE)) #> [1] TRUE # These appear not to short circuit any(TRUE, stop("this is an error")) #> Error in eval(expr, envir, enclos): this is an error all(FALSE, stop("this is an error")) #> Error in eval(expr, envir, enclos): this is an error
We can make the strict operators even stricter
Recall the following unsettling example where just ttff[1]
and
tftf[1]
are considered in the if()
statement.
if (ttff || tftf) { "No warning but this code is not right." } #> [1] "No warning but this code is not right."
It would be nice to rule out this behavior outright and make this
behavior illegal. In fact, we can make our code stricter by setting the
system environment variable _R_CHECK_LENGTH_1_LOGIC2_
. Once set, using
the strict forms on inputs longer than 1 will throw an error.
Sys.setenv("_R_CHECK_LENGTH_1_LOGIC2_" = "TRUE") ttff && tftf #> Error in ttff && tftf: 'length(x) = 4 > 1' in coercion to 'logical(1)' ttff || tftf #> Error in ttff || tftf: 'length(x) = 4 > 1' in coercion to 'logical(1)' # Default behavior Sys.unsetenv("_R_CHECK_LENGTH_1_LOGIC2_") ttff && tftf #> [1] TRUE ttff || tftf #> [1] TRUE
A related environment variable is _R_CHECK_LENGTH_1_CONDITION_
that
turn vectors inside of if()
into errors instead of warnings.
Sys.setenv("_R_CHECK_LENGTH_1_CONDITION_" = "TRUE") if (ttff | tftf) { "This code will not even be seen." } #> Error in if (ttff | tftf) {: the condition has length > 1 # Default behavior Sys.unsetenv("_R_CHECK_LENGTH_1_CONDITION_") if (ttff | tftf) { "We are going to get a warning." } #> Warning in if (ttff | tftf) {: the condition has length > 1 and only the first #> element will be used #> [1] "We are going to get a warning."
To make project code more robust, one might consider setting these inside of a .Renviron file or using them with dotenv. (But if I remember correctly, these checks apply to package code so you get errors for legal-but-dodgy R code in other people’s packages.)
NA
s infect other values
All of the above examples conveniently avoided NA
s. These are missing
values that infect other logical values, turning them into NA
s. For this
section, I want to briefly highlight some behaviors of NA
s and some
functions that can help us work around them.
For and, an NA
with any non-FALSE
value is NA
. For or, an NA
with any non-TRUE
value is NA
. That’s a funny sentence, but it
reflects the case where we can infer the answer without seeing the NA
value. TRUE | NA
(and NA | TRUE
) returns TRUE
because it would
return TRUE
if the NA
was actually a TRUE
or FALSE
. The same
holds for FALSE & NA
(and NA & FALSE
) returning FALSE
. If we
un-missing-ed the NA
into TRUE
or FALSE
, the statement would still
be FALSE
.
tfnn <- c(TRUE, FALSE, NA, NA) nntf <- c(NA, NA, TRUE, FALSE) tfnn & nntf #> [1] NA FALSE NA FALSE tfnn | nntf #> [1] TRUE NA TRUE NA # Infecting in all() and any() TRUE && NA #> [1] NA FALSE || NA #> [1] NA all(TRUE, TRUE, NA) #> [1] NA any(FALSE, FALSE, NA) #> [1] NA # These return FALSE and TRUE because they would return TRUE and FALSE # regardless of whether the NA was a TRUE or a FALSE. FALSE && NA #> [1] FALSE TRUE || NA #> [1] TRUE all(NA, FALSE, FALSE) #> [1] FALSE any(TRUE, FALSE, NA) #> [1] TRUE
This uhh complicates things, so how do I check if what I have is TRUE
or FALSE
? isTRUE()
and
isFALSE()
provide direct tests of
whether the input is the scalar TRUE
or the scalar FALSE
.
c(isTRUE(TRUE), isTRUE(FALSE), isTRUE(NA)) #> [1] TRUE FALSE FALSE c(isFALSE(TRUE), isFALSE(FALSE), isFALSE(NA)) #> [1] FALSE TRUE FALSE
The documentation notes that
if(isTRUE(cond))
may be preferable toif(cond)
because ofNA
s
so isTRUE()
is something we run into packaged code.
isTRUE()
and isFALSE()
are not vectorized, but we can check elements
in a vector are TRUE
and only TRUE
in a few different ways.
# Make a new function Vectorize(FUN = isTRUE)(tfnn) #> [1] TRUE FALSE FALSE FALSE # Apply the function on a vector vapply(X = tfnn, FUN = isTRUE, FUN.VALUE = logical(1)) #> [1] TRUE FALSE FALSE FALSE # Use table-lookup or set operations tfnn %in% TRUE #> [1] TRUE FALSE FALSE FALSE is.element(el = tfnn, set = TRUE) #> [1] TRUE FALSE FALSE FALSE
I think that’s just about every useful thing I want to say about and
and or in R. But just remember, &&
and ||
are longer operators because
they are stricter.
Last knitted on 2021-07-01. Source code on GitHub.1
-
sessioninfo::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 4.1.0 (2021-05-18) #> os Windows 10 x64 #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_United States.1252 #> ctype English_United States.1252 #> tz America/Chicago #> date 2021-07-01 #> #> - Packages ------------------------------------------------------------------- #> package * version date lib source #> cli 2.5.0 2021-04-26 [1] CRAN (R 4.1.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0) #> git2r 0.28.0 2021-01-10 [1] CRAN (R 4.1.0) #> here 1.0.1 2020-12-13 [1] CRAN (R 4.1.0) #> jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.1.0) #> knitr * 1.33 2021-04-24 [1] CRAN (R 4.1.0) #> lattice 0.20-44 2021-05-02 [1] CRAN (R 4.1.0) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0) #> Matrix 1.3-4 2021-06-01 [1] CRAN (R 4.1.0) #> png 0.1-7 2013-12-03 [1] CRAN (R 4.1.0) #> purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) #> ragg 1.1.3 2021-06-09 [1] CRAN (R 4.1.0) #> rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.1.0) #> Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.1.0) #> reticulate 1.20 2021-05-03 [1] CRAN (R 4.1.0) #> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.0) #> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0) #> stringi 1.6.1 2021-05-10 [1] CRAN (R 4.1.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0) #> systemfonts 1.0.2 2021-05-11 [1] CRAN (R 4.1.0) #> textshaping 0.3.5 2021-06-09 [1] CRAN (R 4.1.0) #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0) #> xfun 0.24 2021-06-15 [1] CRAN (R 4.1.0) #> #> [1] C:/Users/Tristan/Documents/R/win-library/4.1 #> [2] C:/Program Files/R/R-4.1.0/library
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.