How to write the first for loop in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In this tutorial we will have a look at how you can write a basic for loop in R. It is aimed at beginners, and if you’re not yet familiar with the basic syntax of the R language we recommend you to first have a look at this introductory R tutorial.
Conceptually, a loop is a way to repeat a sequence of instructions under certain conditions. They allow you to automate parts of your code that are in need of repetition. Sounds weird? No worries, it will become more clear once we start working with some examples below.
Before you dive into writing loops in R, there is one important thing you should know. When surfing on the web you’ll often read that one should avoid making use of loops in R. Why? Well, that’s because R supports vectorization. Simply put, this allows for much faster calculations. For example, solutions that make use of loops are less efficient than vectorized solutions that make use of apply functions, such as lapply and sapply. It’s often better to use the latter. Nevertheless, as a beginner in R, it is good to have a basic understanding of loops and how to write them. If you want to learn more on the concepts of vectorization in R, this is a good read.
Writing a simple for loop in R
Let’s get back to the conceptual meaning of a loop. Suppose you want to do several printouts of the following form: The year is [year] where [year] is equal to 2010, 2011, up to 2015. You can do this as follows:
print(paste("The year is", 2010)) "The year is 2010" print(paste("The year is", 2011)) "The year is 2011" print(paste("The year is", 2012)) "The year is 2012" print(paste("The year is", 2013)) "The year is 2013" print(paste("The year is", 2014)) "The year is 2014" print(paste("The year is", 2015)) "The year is 2015"
You immediately see this is rather tedious: you repeat the same code chunk over and over. This violates the DRY principle, known in every programming language: Don’t Repeat Yourself, at all cost. In this case, by making use of a for loop in R, you can automate the repetitive part:
for (year in c(2010,2011,2012,2013,2014,2015)){ print(paste("The year is", year)) } "The year is 2010" "The year is 2011" "The year is 2012" "The year is 2013" "The year is 2014" "The year is 2015"
The best way to understand what is going on in the for loop, is by reading it as follows: “For each year
that is in the sequence c(2010,2011,2012,2013,2014,2015)
you execute the code chunk print(paste("The year is", year))
”. Once the for loop has executed the code chunk for every year in the vector, the loop stops and goes to the first instruction after the loop block.
See how we did that? By using a for loop you only need to write down your code chunk once (instead of six times). The for loop then runs the statement once for each provided value (the different years we provided) and sets the variable (year
in this case) to that value. You can even simplify the code even more: c(2010,2011,2012,2013,2014,2015)
can also be written as 2010:2015
; this creates the exact same sequence:
for (year in 2010:2015){ print(paste("The year is", year)) } "The year is 2010" "The year is 2011" "The year is 2012" "The year is 2013" "The year is 2014" "The year is 2015"
As a last note on the for loop in R: in this case we made use of the variable year
but in fact any variable could be used here. For example you could have used i
, a commonly-used variable in for loops that stands for index:
for (i in 2010:2015){ print(paste("The year is", i)) } "The year is 2010" "The year is 2011" "The year is 2012" "The year is 2013" "The year is 2014" "The year is 2015"
This produces the exact same output. So you can really name the variable anyway you want, but it’s just more understandable if you use meaningful names.
Using Next
Let’s have a look at a more mathematical example. Suppose you need to print all uneven numbers between 1 and 10 but even numbers should not be printed. In that case your loop would look like this:
for (i in 1:10) { if (!i %% 2){ next } print(i) } 1 3 5 7 9
Notice the introduction of the next statement. Let’s explore the meaning of this statement walking through this loop together:
When i
is between 1 and 10 we enter the loop and if not the loop stops. In case we enter the loop, we need to check if the value of i
is uneven. If the value of i
has a remainder of zero when divided by 2 (that’s why we use the modulus operand %%) we don’t enter the if statement, execute the print function and loop back. In case the remainder is non zero, the if statement evaluates to TRUE and we enter the conditional. Here we now see the next
statement which causes to loop back to the i in 1:10
condition thereby ignoring the the instructions that follows (so the print(i)
).
Closing remarks
In this short tutorial you got acquainted with the for loop in R. While the usage of loops in general should be avoided in R, it still remains valuable to have this knowledge in your skillset. It helps you understand underlying principles, and when prototyping a loop solution is easy to code and read. In case you want to learn more on loops, you can always check this R tutorial.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.