Two common mistakes with the colon operator in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R has a colon operator which makes it really easy to define a sequence of integers. For example, the code 1:10
generates a vector of consisting of the integers from 1 to 10 (inclusive). However, using the colon operator is not without its pitfalls! I will highlight two common mistakes here.
First, imagine that you have a variable n
which has value 5. What do you think the following code prints out?
for (i in 1:n+1) print(i)
My first instinct is that it should print out the numbers 1, 2, …, 6 (inclusive), with one number on each line. Wrong! Instead, this is the output we get:
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
What is going on here? The problem here is one of operator precedence. Just like how and come before and , in R :
comes before +
. Hence, the code written above is interpreted as
for (i in (1:n)+1) print(i)
which is why the numbers 2 to 6 are printed out instead of the numbers 1 to 5. If we want to print the numbers 1 to n+1
inclusive, put brackets to enforce the correct order for evaluation:
for (i in 1:(n+1)) print(i)
Let’s move on to the second common mistake. Let’s say I have a vector vec
and I want to print its elements one by one. The first instinct of most of us would be to write something like this:
for (i in 1:length(vec)) print(vec[i])
This works most of the time, but not all the time. Consider what happens when vec
is an empty vector:
vec <- c() for (i in 1:length(vec)) print(vec[i])
NULL
NULL
What happened here? The problem is that the colon operator can return a descending sequence of integers! In the code above, length(vec)
has value 0, so 1:length(vec)
is the same as c(1, 0)
. It prints out vec[1]
and vec[0]
, which are both NULL
.
To avoid this problem, use the seq_along
function instead:
for (i in seq_along(vec)) print(vec[i])
You may think that this is not really a big problem; after all, it only fails when we have an empty vector right? Well, there are 2 responses to that. First, you don’t want your code to ever do anything unintended. In this case the mistake was easy to catch; in some cases this mistake might be 3 levels deep in your code which is thousands of lines long— not so easy to catch anymore! The second response is that this mistake will crop up more easily when you don’t start from the first element of the vector.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.