Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Learning number 1: make functions fail early
When writing your own functions, avoid conversion of types without warning. For example, this function only works on characters:
my_nchar <- function(x, result = 0){ if(x == ""){ result } else { result <- result + 1 split_x <- strsplit(x, split = "")[[1]] my_nchar(paste0(split_x[-1], collapse = ""), result) } } my_nchar("100000000") ## [1] 9 my_nchar(100000000) Error in strsplit(x, split = "") : non-character argument
It may tempting to write functions that accept a lot of different types of inputs, because it seems convenient and you’re a lazy ding-dong:
my_nchar2 <- function(x, result = 0){ # What could go wrong? x <- as.character(x) if(x == ""){ result } else { result <- result + 1 split_x <- strsplit(x, split = "")[[1]] my_nchar2(paste0(split_x[-1], collapse = ""), result) } }
You should avoid doing this, because this can have unforseen consequences:
my_nchar2(10000000) ## [1] 5
If you think that this example is far-fetched, you’d be surprised to learn that this is exactly
what nchar()
, the built-in function to count characters, does:
nchar("10000000") ## [1] 8
to this:
nchar(10000000) ## [1] 5
(thanks to @cararthompson for pointing this out on twitter)
You can also add guards to be extra safe:
my_nchar2 <- function(x, result = 0){ if(!isTRUE(is.character(x))){ stop(paste0("x should be of type 'character', but is of type '", typeof(x), "' instead.")) } else if(x == ""){ result } else { result <- result + 1 split_x <- strsplit(x, split = "")[[1]] my_nchar2(paste0(split_x[-1], collapse = ""), result) } } my_nchar2("10000000") ## [1] 8
compare to this:
my_nchar2(10000000) Error in my_nchar2(1000): x should be of type 'character', but is of type 'double' instead.
Now this doesn’t really help here, because our function is already safe (it only handles
characters, since strsplit()
only handles characters), but in other situations this could
be helpful (and at least we customized the error message). Since it can be quite tedious
to write all these if...else...
statements, you might want to take a look at
purrr::safely()
(and purrr::possibly()
),
the {maybe} package, or the
{typed} package, or even
my package for that matter.
Learning number 2: Make your functions referentially transparent (and as pure as possible)
Any variable used by a function should be one of its parameters. Don’t do this:
f <- function(x){ x + y }
This function has only one parameter, x
, and so depends on y
outside of this scope.
This function is unpredictable, because the result it provides depends on the value of y
.
See what happens:
f(10) ## [1] 20 f(10) ## [1] 10
I called f
twice with 10
and got two results (because I changed the value of y
without showing you). In very long scripts, having functions like this depending on
values in the global environment is a recipe for disaster. It’s better to make this
function referentially transparent; some very complicated words to describe a very
simple concept:
f <- function(x, y){ x + y }
Just give f
a second parameter, and you’re good to go.
Something else your functions shouldn’t do is changing stuff outside of its scope:
f <- function(x, y){ result <<- x + y }
Let’s take a look at variables in global environment before calling f
:
ls() ## [1] "f" "my_nchar" "my_nchar2" "view" "view_xl" "y"
Now let’s call it:
f(1, 2)
And let’s have a good look at the global environment again:
ls() ## [1] "f" "my_nchar" "my_nchar2" "result" "view" "view_xl" ## [7] "y"
We now see that result
has been defined in the global environment:
result ## [1] 3
Just like before, if your functions change stuff outside their scope, this is
a recipe for disaster. You have to be very careful and know exactly what you’re doing
if you want to use <<-
.
So it’s better to write your function like this, and call it like this:
f <- function(x, y){ x + y } result <- f(1, 2)
Learning number 3: make your functions do one thing
Try to write small functions that do just one thing. This make them easier to document, test and simply wrap your head around. You can then pipe your function one after the other to get stuff done:
a |> f() |> g() |> h()
You have of course to make sure that the output of f()
is of the correct type,
so that g()
then knows how to handle it. In some cases, you really need a function
to do several things to get the output you want. In that case, still write small
functions to handle every aspect of the whole algorithm, and then write a function
that calls each function. And if needed, you can even provide functions as arguments
to other functions:
h <- function(x, y, f, g){ f(x) + g(y) }
This makes h()
a higher-order function.
Learning number 4: use higher-order functions to abstract loops away
Loops are hard to write. Higher order function are really cool though:
Reduce(`+`, seq(1:100)) ## [1] 5050
Reduce()
is a higher-order function that takes a function (here +
) and a list
of inputs compatible with the function. So Reduce()
performs this operation:
Reduce(`+`, seq(1:100)) 100 + Reduce(`+`, seq(2:100)) 100 + 99 + Reduce(`+`, seq(3:100)) 100 + 99 + 98 + Reduce(`+`, seq(4:100))
This avoids having to write a loop, which can go wrong for many reasons (typos, checking input types, depending on variables outside the global environment… basically anything I mentioned already).
There’s also purrr::reduce()
if you prefer the tidyverse
ecosystem. Higher-order
functions are super flexible; all that matters is that the function you give to reduce()
knows what the do with the elements in the list.
Another higher-order function you should know about is purrr::map()
(or lapply()
if
your prefer base
functions):
purrr::map(list(mtcars, iris), nrow) ## [[1]] ## [1] 32 ## ## [[2]] ## [1] 150
This loops a function (here nrow()
) over a list of whatevers (here data frames). Super
flexible once again.
(Optional) Learning number 5: use recursion to avoid loops further
The following function calls itself and reverses a string:
rev_char <- function(x){ try({ if(x == ""){ "" } else { split_x <- strsplit(x, split = "")[[1]] len_x <- length(split_x) paste0(split_x[len_x], rev_char(paste0(split_x[1:len_x-1], collapse = ""))) } }, stop(paste0("x should be of type 'character', but is of type '", typeof(x), "' instead."))) } rev_char("abc") ## [1] "cba"
I say that this is optional, because while it might sometimes be easier to
use recursion to define a functions, this is not always the case, and (in the case of R)
runs slower than using a loop. If you’re interested in learning more about map()
and reduce()
, I wrote
several blog posts on it here,
here and here
and some youtube videos as well:
- https://www.youtube.com/watch?v=3xIKZbZKCWQ
- https://www.youtube.com/watch?v=WjtXc4OXZuk
- https://www.youtube.com/watch?v=vxaKamox_CQ
- https://www.youtube.com/watch?v=H3ao7LzcvW8
- https://www.youtube.com/watch?v=vtxb1j0aqJM
- https://www.youtube.com/watch?v=F2U-l3IcCtc
- https://www.youtube.com/watch?v=gVW9KfkJIrQ
- https://www.youtube.com/watch?v=FanU60pjmt0
- https://www.youtube.com/watch?v=DERMZi3Ck20
Hope you enjoyed! If you found this blog post useful, you might want to follow me on twitter for blog post updates and buy me an espresso or paypal.me, or buy my ebook on Leanpub. You can also watch my videos on youtube. So much content for you to consoom!