Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Welcome back to my R for SEO series. We’re in the home stretch now, with part seven. Today, we’re going to be looking at different ways that we can run functions or commands over a series of elements using the various kinds of loops that exist in R.
If you’ve followed along so far, or you’ve tried some experimentation of your own, you’ve probably encountered loops and applys along the way. I know early on in my R journey, it very much seemed like pot luck as to which apply I should use, or whether a loop was easier, so hopefully today’s piece will start to clear that up for you a little.
I know that most programming courses cover these elements earlier, but for me, it really didn’t click until I’d learned more about the other areas we’ve covered in this series, so that’s why I’ve placed it here.
As always, if you’ve found this useful, please give it a share on your social networks and please sign up to my free email updates to be alerted when I drop my next article.
What Is A Loop?
A loop in R is more or less what it sounds like – a command that keeps running some code until a certain condition stops it.
There are two main types that we’ll look at today: the for loop and the while loop.
Before we jump into how they work, let’s look at what the two different loops do and are used for.
- The for loop: The for loop is the most commonly used one in R and works great if you have a defined vector or dataset that you want your commands to run over, or you know how many times you want to run it
- The while loop: The while loop keeps running as long as a certain condition is met, whether it is a certain value, loop length or even timeframe. They’re very useful
The For Loop In R
If you’re familiar with other programming languages like Python, the humble for loop will be something in your arsenal, and they’re no less powerful in R.
Let’s put a really simple for loop together below, running through our Google Search Console data from the last few pieces (the tutorial is in part 2). We’re going to use this loop to count the number of keywords which have 20 or more impressions.
First we want to create an object called kwCount and we’re going to set its value to zero, like so:
kwCount <- 0
Now to create our loop:
for (val in gsc$Impressions){ if(val >= 20) kwCount = kwCount+1 }
OK, let’s break it down.
The For Loop In R Explained
There is a lot that you can do with loops, and this is only a really basic example, but they all follow the same general process.
- for (val in gsc$Impressions){: We’re starting our loop with “for” and saying that for every value in gsc$Impressions, to run the command within our braces. There are a number of different ways that loops can be used, and I’ll show a couple of more along the way, but the anatomy is always similar
- if(val >= 20) kwCount = kwCount+1}: As we saw in our R if statements tutorial, our braces incorporate our commands. In this simple example, we’re using a small if statement, saying that if the value is greater than or equal to 20, to add it to our kwCount vector as a numerical value of plus 1
So as you can see, this is a very simple for loop using R. As in our R functions piece, we don’t strictly need a loop for this, but it’s a simple way to show you the anatomy.
If you now type kwCount in your console, you’ll see the total number of your Google Search Console queries that are greater than or equal to 20 impressions. In my case, you get the following:
Other For Loop Methods
In our previous example, we used a simple val in method for our for loop, but there are many others.
Personally, I find myself using loops on lists or vectors a lot more, so I find myself using some alternatives along the way. Let’s take a look at some of them:
Using For Loops On A List
Let’s use a for loop to create a subset of our Google Search Console that only incorporates rows which have 20 or more impressions. Again, a loop is overkill here, but it’s a good example.
First, we want to create our dataframe to host our data. Let’s call that kw20.
kw20 <- data.frame()
Now our for loop is as follows:
for (i in seq_along(gsc)) { if (gsc$Impressions[i] >= 20) { kw20 <- rbind(kw20, gsc[i, ]) } }
Fairly self-explanatory, isn’t it? But, as always, let’s break it down:
- for (i in seq_along(gsc)){: As before, we’re invoking our for loop, but there’s a key difference here. We’re using “i in seq_along(gsc)”, which means “for this value (i) in the sequence along our list of our gsc object, we want to do what is in our braces”
- if (gsc$Impressions[i] >= 20){: We’re using an if statement to see that if the value in our list we are looping to (i) is greater than or equal to 20, to do our next action in the braces
- kw20 <- rbind(kw20, gsc[i, ])}}: Let’s bring it home with our action. Our loop will take a row (i) that matches the conditions we set in the previous command and using rbind, will make it part of our kw20 frame
Obviously, if we were just looking to subset our dataset according to these conditions, we’d just use subset as we saw in part 1, and using this loop with rbind within would be very inefficient, but I hope it gives you a good example of how you can use the for loop across a list.
To see it in action, you can see my Bulk Resizing Images in R post, which features loops quite a bit.
Using For Loops On A Dataframe
Similar to our list above, let’s create a loop that subsets a dataframe if impressions are equal to or less than 20.
It’s not too dissimilar, but for the sake of this exercise, I’m going to use seq_len instead of seq_along. They’re not really very different, but it’s a more commonly-used iteration for dataframes.
kwL20 <- data.frame() for (i in seq_len(nrow(gsc))) { if (gsc$Impressions[i] <= 20) { kwL20 <- rbind(kwL20, gsc[i, ]) } }
As you can see, it’s exactly the same as on our list, aside from that I’ve changed our output dataframe to kwL20 (keywords less than 20) and used seq_len(nrow for our frame. This works more or less the same as seq_along, but is a little more explicit. I’ve also set the impressions volume to be less than or equal to 20 for this exercise.
So there we have it. An introduction to the for loop in R. While these are simple examples, I hope it’s given you an idea of how they can work and be used in your SEO work. Next up, lets have a look at the while loop.
While Loops In R
Where we saw the for loop, which executes our code across every item in our dataset, the while loop is a little more steady. A while loop in R will keep executing its command all the time a condition is met, and will stop when that condition is no longer true.
While loops are great for automation. I’ve used them in the past to run real-time API data from Salesforce into my environment during a specific timeframe, for example. Again though, very simple to create and execute.
Let’s do a very simple while loop in R, looking at our Google Search Console dataset once again.
A While Loop To Find Keywords With More Than 20 Impressions
Again, this is a bit of a case of a sledgehammer to crack a nut, but hopefully this simple example will give you some ideas of where and how you can use a while loop in your day to day SEO work with R.
kw_df <- data.frame(Query = character()) index <- 1 while (index <= nrow(gsc)) { if (gsc$Impressions[index] >= 20) { kw_df <- rbind(kw_df, data.frame(Query = gsc$Top.queries[index])) } index <- index + 1 }
Again, by this point in your R journey, this might look pretty simple. But let’s break it down anyway.
The While Loop Explained
Let’s dig into how this while loop works.
- kw_df <- data.frame(Query = character()): This will be familiar to you by now, but we’re creating a new dataframe called kw_df and setting the Query column to be character
- index <- 1: We’re creating a numerical object called index to match our loop against, starting it with 1
- while (index <= nrow(gsc)) {: Now onto our loop. We’re starting with a while command rather than for, and then instructing R that while the value of our index object is lower than the number of rows in our gsc dataframe, to execute our code
- if (gsc$Impressions[index] >= 20) {: There’s our if statement again. In this case, our condition is that our Impressions value at the row number defined in our index is greater than, or equal to 20, to execute our code
- kw_df <- rbind(kw_df, data.frame(Query = gsc$Top.queries[index]))}: As with our for loop, we’re rbinding our query that matches the relevant row from our index if our if statement is true, and adding it to kw_df
- index <- index + 1}: This is the important part to keep our loop running. At the end of our loop, we add 1 to the value in our index, which will keep our while loop going – once the number in index is larger than the number of rows in our dataset, the loop will stop
That’s a very simple introduction to while loops. There’s an awful lot that you can do with these, and I’m sometimes a little guilty of using them when I should use a more elegant solution because I’m in a hurry. Try them yourself – while loops in R have a lot of applications to SEO work.
Break & Next Conditions In R Loops
The break and next conditions in loops are commands that either stop the loop dead once that condition is met or simply move to the next iteration based on the output.
I don’t generally use these too much, if I’m honest, aside from using them as a crude form of error handling if I’m in a hurry, but they’re worth knowing. Let’s take a look at the break condition within a repeat loop.
Break Conditions
Break conditions are more or less exactly what they sound like – a condition under which, a loop will break, or stop.
I don’t really use break conditions too much, largely because it’s quite rare I use repeat loops, unless they’re within a specific function and the loop is required for something. However, repeat loops are the best way to demonstrate the break condition in action.
Here’s a really simple repeat loop with a break condition that will take an object called repVal with a value of one, repeatedly printing that value and then adding 1 to the object each time. And then once we hit 10 repetitions of that loop, the break condition comes in and stops it.
Let’s have a look at the code and then we’ll break it down.
repVal <- 1 repeat{ print(repVal) repVal <- repVal + 1 if(repVal > 10){ break } }
If this runs properly, you’ll get the following output in your console:
Simple, right? Let’s see how it works.
The Repeat Loop & Break Condition Explained
Here’s that phrase again: let’s break it down.
- repVal <- 1: We’re creating our repVal object and assigning it a value of 1
- repeat{: This is the type of loop that we’re using, similar to the for and while loops
- print(repVal): Our loop will print the value of repVal on a continual cycle as long as our loop runs
- repVal <- repVal + 1: Now we’re saying to add 1 to our repVal object every time the loop runs
- if(repVal > 10){: There’s our if statement again. Nice and simple here, we’re just seeing if the value of repVal is greater than ten
- break}}: And finally, our break condition. Essentially, once our if statement becomes true (repVal has become greater than ten), it triggers our break condition and stops the loop
And that’s how a break condition can be used in a repeat loop. It can be used in any of the other types of loops as well, and it can be a handy way to stop a loop once a certain condition is met.
Now let’s take a look at next conditions.
Next Conditions In R Loops
The next condition is one I do use a little bit more regularly. It essentially skips to the next iteration of our command if a certain condition is not met. For example, I sometimes use it for skipping empty outputs from APIs, if there’s no data returned for a certain keyword or page, I don’t want an error, I just want it to skip to the next one.
Let’s use another simple example, with a for loop this time.
for (val in 1:10){ if (val == 5){ next } print(val) }
Again, nice and simple and if you’ve followed along so far, you should be able to figure out what’s happening here, but let’s run through it anyway.
The Next Condition Explained
Shall we see how this example works?
- for (val in 1:10){: In this example, we’re going to use a for loop to cycle our commands through the values of one to ten
- if (val == 5){: You’re seeing why if statements are so fundamental to programming now, right? In this case, our if statement is checking to see if we’ve looped to a point where our val is exactly equal to five
- next{: If our if statement turns out to be true, and our val does exactly equal five, we skip to the next number in our val series
- print(val): And finally, as long as we’ve not triggered our next condition, we will see a list of the numbers in our val
If this all runs correctly, we should see the following in our R console:
So as you can see, loops in R are quite simple and give you a really good way to iterate a command over a dataset, but there’ssome controversy about when, or indeed if you should ever use them in R.
The Loop Vs Apply Debate
The word is that using loops in R is dirty code. That they’re slow, that you’re using more code than you should need, that using the apply family is just better.
Personally, as someone that came to R from learning bits of a bunch of different languages and who is doing more with Python and Julia these days, loops have always made sense to me and been something of a go-to (as you’ll see in my other R posts), but I have come to appreciate the various apply methods available as well.
In my experience and through researching, it seems that loops in R being slow is something of a fallacy. That said, you do often end up writing more code than you would with one of the apply methods. Conversely, I’ve always found loops to be very reliable, whereas I sometimes have to take a couple of extra steps to get an apply working.
Still, apply methods are great, and you’ll be using them a lot as you go through your R journey, and they’ll be the subject of my next post.
Wrapping Up
I promise, I did try to make this one a bit shorter than other pieces, but there’s what you need to know about using loops in R, covering the for loop, while loop, repeat loop and the break, jump and next conditions. Try them yourself and I hope you find them useful.
Join me in the next piece, where I’ll be covering the various apply methods that you can use.
Until next time.
Our Code From Today
# For Loop kwCount <- 0 for (val in gsc$Impressions){ if(val >= 20) kwCount = kwCount+1 } ## On A List kw20 <- data.frame() for (i in seq_along(gsc)) { if (gsc$Impressions[i] >= 20) { kw20 <- rbind(kw20, gsc[i, ]) } } ## On A Dataframe kwL20 <- data.frame() for (i in seq_len(nrow(gsc))) { if (gsc$Impressions[i] <= 20) { kwL20 <- rbind(kwL20, gsc[i, ]) } } # While Loop kw_df <- data.frame(Query = character()) index <- 1 while (index <= nrow(gsc)) { if (gsc$Impressions[index] >= 20) { kw_df <- rbind(kw_df, data.frame(Query = gsc$Top.queries[index])) } index <- index + 1 } # Break & Next Conditions ## Repeat Loop With Break Condition repVal <- 1 repeat{ print(repVal) repVal <- repVal + 1 if(repVal > 10){ break } } ## For Loop With Jump Condition for (val in 1:10){ if (val == 5){ next } print(val) }
This post was written by Ben Johnston on Ben Johnston
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.