Site icon R-bloggers

Iterating over the lines of a data.frame with purrr

[This article was first published on rstats-tips.net, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I sometimes have a function which takes some parameters and returns a data.frame as a result. Then I have a data.frame where each row of it is a set of parameters. So I like to apply the function to each row of the parameter-data.frame and rbind the resulting data.frames.

There are several ways to do it. Let’s have a look:

The function …

So let’s build a simple function we can use

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
my_function <- function(repeated = 1, text = "a", number_rows = 2) {
  row <- data.frame(
    `repeated` = repeated,
    `text`   = text,
    `number_rows`  = number_rows, 
    generated_text = paste(replicate(repeated, text), collapse = "")
  )
  return(do.call("rbind", replicate(number_rows, row, simplify = FALSE)))
}

my_function(3, "Hello ", 4)
1
2
3
4
5
##   repeated   text number_rows     generated_text
## 1        3 Hello            4 Hello Hello Hello 
## 2        3 Hello            4 Hello Hello Hello 
## 3        3 Hello            4 Hello Hello Hello 
## 4        3 Hello            4 Hello Hello Hello

So this function takes three arguments and returns a data.frame. The length of the data.frame depends on the last parameter.

… and its parameters

So now we have several tuples of paramters. Each tuple is a row of our parameter-data.frame:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
options(tidyverse.quiet = TRUE)
library(tidyverse, warn.conflicts = FALSE)
parameters <- tribble(
  ~repeated, ~text, ~number_rows,
  1, "one", 3,
  2, "two", 2,
  3, "three", 1
) %>% 
  as.data.frame()

parameters
1
2
3
4
##   repeated  text number_rows
## 1        1   one           3
## 2        2   two           2
## 3        3 three           1

So now we want to apply our function three times, one time for each row of the data.frame parameters.

Iterating with …

There are several ways to interate.

… a for-loop

The most common way in programming is a for-loop:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# initialize result
result <- data.frame(
  repeated       = numeric(0),
  text           = character(0),
  number_rows    = numeric(0),
  generated_text = character(0),
  stringsAsFactors = FALSE
)

for (i in 1:length(parameters)) {
  result <- rbind(result,
                  my_function(parameters[i,1], parameters[i,2], parameters[i,3]))
}

result
1
2
3
4
5
6
7
##   repeated  text number_rows  generated_text
## 1        1   one           3             one
## 2        1   one           3             one
## 3        1   one           3             one
## 4        2   two           2          twotwo
## 5        2   two           2          twotwo
## 6        3 three           1 threethreethree

That’s very ugly: You have to initialize the result-data.frame and it’s slow. Whenever you want to use a for-loop in R step back and think about using something else.

… lapply()

Instead of for-loops you should use apply or one of its derivates. But apply works with lists. data.frames are lists but column-wise ones.

So we need to split the data.frame parameters into a list rowwise using split. Then we can apply my_function. Then we use do.call(rbind, x) do merge the results into one data.frame.

1
2
3
do.call(rbind,
        lapply(split(parameters, 1:nrow(parameters)), function(x) my_function(x[[1]], x[[2]], x[[3]]))
)
1
2
3
4
5
6
7
##     repeated  text number_rows  generated_text
## 1.1        1   one           3             one
## 1.2        1   one           3             one
## 1.3        1   one           3             one
## 2.1        2   two           2          twotwo
## 2.2        2   two           2          twotwo
## 3          3 three           1 threethreethree

That’s a lot more R-like. But the winner is:

… pmap_dfr() out of the purrr-package

The most elegant way I know of is purr’s pmap_dfr

1
pmap_dfr(parameters, my_function)
1
2
3
4
5
6
7
##   repeated  text number_rows  generated_text
## 1        1   one           3             one
## 2        1   one           3             one
## 3        1   one           3             one
## 4        2   two           2          twotwo
## 5        2   two           2          twotwo
## 6        3 three           1 threethreethree

pmap_dfr respects the column-names and parameter-names of the function. So you can mix them in the parameter-data.frame:

1
2
3
4
5
6
# Mix the parameter columns
parameters_mixed_columns <- parameters %>% 
  select(text, number_rows, repeated)

# pmap_dfr still works as wanted
pmap_dfr(parameters_mixed_columns, my_function)
1
2
3
4
5
6
7
##   repeated  text number_rows  generated_text
## 1        1   one           3             one
## 2        1   one           3             one
## 3        1   one           3             one
## 4        2   two           2          twotwo
## 5        2   two           2          twotwo
## 6        3 three           1 threethreethree

To leave a comment for the author, please follow the link and comment on their blog: rstats-tips.net.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.