Skip errors in R loops by not writing loops
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
You probably have encountered situations similar to this one:
result = vector("list", length(some_numbers)) for(i in seq_along(some_numbers)){ result[[i]] = some_function(some_numbers[[i]]) } print(result)
First I initialize result
, an empty list of size equal to the length of some_numbers
which will contains the results of applying some_function()
to each element of some_numbers
. Then, using a for loop, I apply the function. This is what I get back:
NaNs producedError in sqrt(x) : non-numeric argument to mathematical function
Let’s take a look at some_numbers
and some_function()
:
print(some_numbers) ## [[1]] ## [1] -1.9 ## ## [[2]] ## [1] 20 ## ## [[3]] ## [1] "-88" ## ## [[4]] ## [1] -42 some_function ## function(x){ ## if(x == 0) res = 0 ## if(x < 0) res = -sqrt(-x) ## if(x > 0) res = sqrt(x) ## return(res) ## }
So the function simply returns the square root of x
(or minus the square root of -x
if x
is negative), but the number in third position of the list some_numbers
is actually a character. This type of mistakes can commonly happen. The result
list looks like this:
print(result) [[1]] [1] -1.378405 [[2]] [1] 4.472136 [[3]] NULL [[4]] NULL
As you see, even though the fourth element could have been computed, the error made the whole loop stop. In such a simple example, you could correct this and then run your function. But what if the list you want to apply your function to is very long and the computation take a very, very long time? Perhaps you simply want to skip these errors and get back to them later. One way of doing that is using tryCatch()
:
result = vector("list", length(some_numbers)) for(i in seq_along(some_numbers)){ result[[i]] = tryCatch(some_function(some_numbers[[i]]), error = function(e) paste("something wrong here")) } print(result) ## [[1]] ## [1] -1.378405 ## ## [[2]] ## [1] 4.472136 ## ## [[3]] ## [1] "something wrong here" ## ## [[4]] ## [1] -6.480741
This works, but it’s verbose and easy to mess up. My advice here is that if you want to skip errors in loops you don’t write loops! This is quite easy with the purrr
package:
library(purrr) result = map(some_numbers, some_function)
There’s several advantages here already; no need to initialize an empty structure to hold your result, and no need to think about indices, which can sometimes get confusing. This however does not work either; there’s still the problem that we have a character inside some_numbers
:
Error in sqrt(x) : non-numeric argument to mathematical function
However, purrr
contains some very amazing functions for error handling, safely()
and possibly()
. Let’s try possibly()
first:
possibly_some_function = possibly(some_function, otherwise = "something wrong here")
possibly()
takes a function as argument as well as otherwise
; this is where you define a return value in case something is wrong. possibly()
then returns a new function that skips errors:
result = map(some_numbers, possibly_some_function) print(result) ## [[1]] ## [1] -1.378405 ## ## [[2]] ## [1] 4.472136 ## ## [[3]] ## [1] "something wrong here" ## ## [[4]] ## [1] -6.480741
When you use possibly()
on a function, you’re politely telling R “would you kindly apply the function wherever possible, and if not, tell me where there was an issue”. What about safely()
?
safely_some_function = safely(some_function) result = map(some_numbers, safely_some_function) str(result) ## List of 4 ## $ :List of 2 ## ..$ result: num -1.38 ## ..$ error : NULL ## $ :List of 2 ## ..$ result: num 4.47 ## ..$ error : NULL ## $ :List of 2 ## ..$ result: NULL ## ..$ error :List of 2 ## .. ..$ message: chr "invalid argument to unary operator" ## .. ..$ call : language -x ## .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition" ## $ :List of 2 ## ..$ result: num -6.48 ## ..$ error : NULL
The major difference with possibly()
is that safely()
returns a more complex object: it returns a list of lists. There are as many lists as there are elements in some_numbers
. Let’s take a look at the first one:
print(result[[1]]) ## $result ## [1] -1.378405 ## ## $error ## NULL
result[[1]]
is a list with a result
and an error
. If there was no error, we get a value in result
and NULL
in error
. If there was an error
, this is what we see:
print(result[[3]]) ## $result ## NULL ## ## $error ## <simpleError in -x: invalid argument to unary operator>
Because lists of lists are not easy to handle, I like to use possibly()
, but if you use safely()
you might want to know about transpose()
, which is another function from purrr
:
result2 = transpose(result) str(result2) ## List of 2 ## $ result:List of 4 ## ..$ : num -1.38 ## ..$ : num 4.47 ## ..$ : NULL ## ..$ : num -6.48 ## $ error :List of 4 ## ..$ : NULL ## ..$ : NULL ## ..$ :List of 2 ## .. ..$ message: chr "invalid argument to unary operator" ## .. ..$ call : language -x ## .. ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition" ## ..$ : NULL
result2
is now a list of two lists: a result
list holding all the results, and an error
list holding all the error message. You can get results with:
result2$result ## [[1]] ## [1] -1.378405 ## ## [[2]] ## [1] 4.472136 ## ## [[3]] ## NULL ## ## [[4]] ## [1] -6.480741
I hope you enjoyed this blog post, and that these functions will make your life easier!
Don’t hesitate to follow us on twitter @rdata_lu and to subscribe to our youtube channel.
You can also contact us if you have any comments or suggestions. See you for the next post!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.