Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
By Chris Campbell – Senior Consultant, UK.
One of the most important objects in R are functions. Functions are a type of object that perform most of the actions we make in R. A quick refresher on objects.
> # We can create objects and view objects > > i <- 1 > n <- 5 > n [1] 5 > i [1] 1 > > # We can use functions to create objects and view those objects > > x <- seq(3, n) > x [1] 3 4 5 > > > out <- vector(length = n, mode = "numeric") > out [1] 0 0 0 0 0
We can use square brackets to assign to elements of a vector.
> # Since i is 1 we can assign a value, say 1, > # to position 1 of our vector > > out[ i] <- 1 > i <- 2 > out[ i] <- 1 > out [1] 1 1 0 0 0
The function for allows us to perform the same action lots of times. It has a slightly unusual syntax where we define an object on the left, the word in in the middle, and a vector on the right. We can use the function for to set an object, such as i, to take each value of a vector in turn. The action we want repeated goes in curly brackets immediately after.
> # In this for loop, i takes values 3 then 4 then 5. > > for (i in seq(3, n)) { + + out[ i] <- out[i - 1] + out[i - 2] + + } > out [1] 1 1 2 3 5 > # We have calculated the first 5 terms of the Fibonacci series
We can use the function function to turn this code snippet into reuseable code that we could use to calculate the first 5, 6, etc. terms of the Fibonacci series.
> # To define a function we specify the arguments in round brackets, > # then the code to be executed in curly brackets. > # We can use return to clarify which object is the output of the function. > > fibonacci <- function(n) { + + out <- vector(length = n, mode = "numeric") + + out[1:2] <- 1 + + for (i in seq(3, n)) { + + out[ i] <- out[i - 1] + out[i - 2] + } + + return(out) + } > fibonacci(n = 5) [1] 1 1 2 3 5 > fibonacci(n = 6) [1] 1 1 2 3 5 8
When we create any form of analytical code it’s a great idea to check that it works. In this case we know, or can look up, the first five terms of the Fibonacci sequence, so we can ask if the output of our function is the same.
> # test 1 - check first five terms > test01 <- fibonacci(n = 5) > all(test01 == c(1, 1, 2, 3, 5)) [1] TRUE
If we want reusable code we should try to think of all of the reasonable ways it might be used (or misused!).
> # Sometimes arguments will not be handled by the code > > fibonacci(n = 2) Error in out[ i] <- out[i - 1] + out[i - 2] : replacement has length zero
Errors are a good thing; they let us know that the code is not behaving as we are expecting. We can use the debug function to follow the progress of our code.
> # Debug the function > > debug(fibonacci) > > fibonacci(n = 2) debugging in: fibonacci(n = 2) debug at #1: { out <- vector(length = n, mode = "numeric") out[1:2] <- 1 for (i in seq(3, n)) { out[ i] <- out[i - 1] + out[i - 2] } return(out) } Browse[2]> debug at #3: out <- vector(length = n, mode = "numeric") Browse[2]> debug at #5: out[1:2] <- 1 Browse[2]> debug at #7: for (i in seq(3, n)) { out[ i] <- out[i - 1] + out[i - 2] } Browse[2]> debug at #7: i Browse[2]> debug at #9: out[ i] <- out[i - 1] + out[i - 2] Browse[2]> debug at #7: i Browse[2]> debug at #9: out[ i] <- out[i - 1] + out[i - 2] Browse[2]> Error in out[ i] <- out[i - 1] + out[i - 2] : replacement has length zero
In debug mode every line of the code is printed before it is evaluated. We can hit <enter> (or n) to execute the line and see the next line. The debug function adds a flag to the function that will cause it to enter debug mode every time it runs. When a function runs, objects are not created in the global environment; instead there is a temporary environment where the calculations are performed, and all objects created are discarded when the evaluation has finished.
> # The function ls prints objects in the global environment > > ls() [1] "fibonacci" "i" "n" "out" "test01" "x" > > fibonacci(n = 2) debugging in: fibonacci(n = 2) debug at #1: { out <- vector(length = n, mode = "numeric") out[1:2] <- 1 for (i in seq(3, n)) { out[ i] <- out[i - 1] + out[i - 2] } return(out) } Browse[2]> n debug at #3: out <- vector(length = n, mode = "numeric") Browse[2]> n debug at #5: out[1:2] <- 1 Browse[2]> ls() # Objects within the function environment [1] "n" "out" Browse[2]> out [1] 0 0 Browse[2]> get("n") # Note that we need to use get to fetch n [1] 2
We can use c to finish the “context” such as the current loop.
Browse[2]> c Error in out[ i] <- out[i - 1] + out[i - 2] : replacement has length zero
Instead we can use Q to quit the current function evaluation without completing it.
Browse[2]> Q >
Having had a look at our code we can see that we haven’t accounted for cases where n is less than 3. We can now update the code to take into account these requirements.
> # Initialize the output as NA, > # then work out the Fibonacci series for large n, > # with special actions when n is 1 or 2 > > fibonacci <- function(n) { + + out <- NA + + if (n > 2) { + + out <- vector(length = n, mode = "numeric") + + out[1:2] <- 1 + + for (i in seq(3, n)) { + + out[ i] <- out[i - 1] + out[i - 2] + } + } + + if (n == 2) { out <- c(1, 1) } + + if (n == 1) { out <- 1 } + + return(out) + }
We can now test our new code.
> # First, run our existing test(s) > # test 1 - check first five terms > test01 <- fibonacci(n = 5) > all(test01 == c(1, 1, 2, 3, 5)) [1] TRUE > # Great, it passed > # Now write new tests to check the functionality we added > > # test 2 - small n > test02 <- fibonacci(n = 2) > all(test02 == c(1, 1)) [1] TRUE > > # test 3 - negative n > test03 <- fibonacci(n = -1) > is.na(test03) [1] TRUE
We are now comfortable that we have coverage of a reasonable range of functionality. If we now make changes to the code, we can be sure that the original functionality is preserved. For example, we want to speed up the code for calculating large numbers terms.
> # The system.time function measures how long an evaluation takes > > system.time(fibonacci(n = 1e6)) user system elapsed 3.05 0.00 3.07
So we implement a new algorithm for calculating terms.
> # Binet's Fibonacci formula > > fibonacci <- function(n) { + + out <- NA + + if (n > 0) { + + nn <- seq_len(n) + + phi <- (1 + 5^(1/2))/2 + + out <- (phi^nn - (-phi)^(-nn))/5^(1/2) + + } + + return(out) + } > > fibonacci(n = 6) [1] 1 1 2 3 5 8 > > system.time(fibonacci(n = 1e6)) user system elapsed 1.34 0.03 1.38
We can check that our new method has not changed the behaviour of our function.
> # test 1 - check first five terms > test01 <- fibonacci(n = 5) > all(test01 == c(1, 1, 2, 3, 5)) [1] FALSE > > # test 2 - small n > test02 <- fibonacci(n = 2) > all(test02 == c(1, 1)) [1] TRUE > > # test 3 - negative n > test03 <- fibonacci(n = -1) > is.na(test03) [1] TRUE
Oh no! Our first test failed. We now need to debug our function to work out why.
> test01 [1] 1 1 2 3 5 > > # Are these integers? > > test01 %% 1 [1] 0.000000e+00 0.000000e+00 0.000000e+00 4.440892e-16 [5] 8.881784e-16
In this case we can see what happened without debugging the function. Since our original code was able to work with integers (for a while at least) we did not need to worry about the precision of doubles. We now need to decide how we are going to handle these values. One approach could be to modify the behaviour of the function. Another could be to modify the behaviour of the test.
This process of writing tests for code during function writing and debugging is called unit testing. There are several packages that make this process more streamlined. They comprise of functions that check the code, and a system of running the entire suite automatically. Package RUnit contains functions such as checkEquals to compare function outputs and predefined values. Package testthat contains equivalent functions such as expect_equal. Both packages include utilities for generating test reports that can show you at a glance where problems occurred following a change to a suite of code.
Normally we have functions that call other functions.
> # A function to calculate Pythagorean Triples > # Koshy, Thomas (2007), p. 581, ISBN 0-12-372487-2 > > getTriangle <- function(N = 1) { + + fib <- fibonacci(n = N + 3) + + ta <- fib[N] * fib[N + 3] + + tb <- 2 * fib[N + 1] * fib[N + 2] + + tc <- fib[N + 1]^2 + fib[N + 2]^2 + + return(c(ta, tb, tc)) + } > > getTriangle(N = 1) [1] 3 4 5 > getTriangle(N = 2) [1] 5 12 13 > getTriangle(N = 3) [1] 16 30 34
If we have an unknown error, we can work out where it happened with traceback. The stack trace of the last error allows us to see that the error in getTriangle occurred during the evaluation of the fibonacci function. We can now debug fibonacci to work out how best to handle this error.
> getTriangle(N = NA) Error in if (n > 0) { : missing value where TRUE/FALSE needed > traceback() 2: fibonacci(n = N + 3) at #3 1: getTriangle(N = NA)
There are a lot of great tools within R for debugging and creating robust code. But everyone who codes at the command line should think about what steps they are taking to get the right answer. One of the great advantages of scripting over spreadsheet analysis is that the components of the calculation are easier to verify. Make the most of these tools and you’ll get the correct answer every time.
Well, almost all of the time!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.