Testing in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Have you ever wondered how to test your code in R?
Do you think that it is hard to test your code in R?
R has its roots as a language in S, which was created back before the idea of object-oriented code was popularized, or the latest new languages were even invented. So, sometimes, testing takes a back burner in R for more reasons than the traditional software development excuses for not testing. It is erroneously believed to be hard to test code in R, or to setup a modern test framework, or to work in a test-first (test-driven) development manner. In truth, one can establish solid tests with a little planning and practice! In fact, anyone who writes code in R is already pretty good at testing their code – whether they know it or not! If you are using R, more than likely, you are producing some sort of statistical model or data analysis output. You inherently have to test your output this throughout the process – whether by inspecting (i.e. testing) the statistical model fit, or the graphical output of a distribution, the box plot, etc.
Hey wait – that’s not the type of testing I meant!
Ok, ok, but my point is that testing is not a foreign concept in R. In fact, it is the entire basis of analysis. So let’s put the “software/engineering testing” spin on the question. Testing in the most basic sense is easy in any language, and R is no exception. For the rest of this post we’ll consider segments of code in functions as an easy way to discuss and call discrete blocks of code. Good, reusable R code uses a lot of functions, so this is a great place to start testing. And I’m assuming if you are thinking about testing your code you are probably planning to use it more than once and likely share it with others.
Basic testing steps for a function (that is already written) are as follows:
- Determine the values of inputs you would expect to have passed to the function and what should be returned
- Determine the types or values of inputs you do not expect to be passed to the function and what should happen when the function is called with each of those inputs
- Call the function with a sample of expected values, and check the returned values
- Call the function with several examples of each incorrect or unexpected input, and check the returned values
That’s it! Let’s walk through an example, we will use the following function for our discussion purposes:
my_function <- function (input) {
if (!class(input) %in% c("integer", "numeric")) {
stop("Invalid Input. Values should be integer or numeric")
}
result <- NA
if (input %in% c(1:5)) {
result <- input * 2
} else if (input < 0) {
result <- NaN
}
return(result)
}
Following these steps on the above example function:
- Integer values from 1 to 5 (inclusive) are the expected inputs
-
- Negative values should return NaN
- 0 and other positive values should return NA
- character values should cause a stop error
- boolean values should cause a stop error
> my_function(1) [1] 2 > my_function(3e0) [1] 6 > my_function(5) [1] 10
> my_function(0) [1] NA > my_function(10) [1] NA > my_function(2.5) [1] NA > my_function(10000.0) [1] NA > my_function(-1) [1] NaN > my_function(-1.465) [1] NaN > my_function("fred") Error in my_function("fred") : Invalid Input. Values should be numeric > my_function("5") Error in my_function("5") : Invalid Input. Values should be numeric > my_function(TRUE) Error in my_function(TRUE) : Invalid Input. Values should be numeric
I call this basic testing because the idea is to ensure that you receive a correct value (expected behavior) for valid inputs and an appropriate response for invalid values. This error or invalid return value will depend on your function's use case - it may be entirely appropriate for your function to throw an error and stop program execution when an invalid value is encountered. You need to ensure that you handle the entire spectrum of possible invalid inputs so that none get past your validation steps and can mislead the function users by returning an inappropriate or unexpected value. You will likely discover some cases you haven't already handled and have to fix up your function during this testing - that's OK and part of the process!
To help yourself and your future function callers, if the function should throw an error, you should give the error an explanatory sentence of text. You will see above that the stop function is called with not only "Invalid Input" but also an explanation of what the values should be. This strategy is common in script function input checking - and is a great practice here in R as well.
Wait - that was TOO easy.
No - that was absolutely exactly what you need to do to ensure basic behavior and robustness. If you only perform this basic step-by-step set of manual tests for all of your reusable functions you will have tested your R code more than most people I've worked with!
But, this test example is still very manual, and honestly, it clutters up your code and output. Let's put it into a simple framework for testing that you can reuse as you make code changes to ensure your function always passes these basic tests.
# ---------------------------------------------------------------------------------------------
# Tests a function for correct output results. The function does not need to be vectorized.
#
# Returns: a character vector of problems found with the results or NA if there are no issues
#
# Note: This function will run all tests and return a vector of character string errors for
# the entire set of tests, not just the first error
# ---------------------------------------------------------------------------------------------
test_a_function <- function(tested_function, # function to be tested
valid_in, # one or more valid input values as a vector
valid_out, # the matching valid output values as a vector
na_in = c(), # function inputs that should return NA
nan_in = c(), # function inputs that should return NaN
warning_in = c(), # function inputs that should return a warning
error_in = c()) { # function inputs that should cause a stop
... download the code file below (removed for brevity)
}
And here are the same set of tests run above in steps 3 and 4 using this helper function:
> test_a_function(my_function,
+ valid_in = c(1, 3e0, 5),+ valid_out = c(2, 6, 10),
+ na_in = c(0, 10, 2.5, 10000.0),
+ nan_in = c(-1, -1.465),
+ error_in = c("fred", "5", TRUE))
[1] NA
Now I (or you, if you download the code!) can call the function tests once in the above clean type of function call, right after the function is created. If the function call errors, I will know that something I did broke it! Voila: a simple, effective, test framework to get you started testing in R. Don't get me wrong - this is just the START, but if you adopt this simple framework then you will be ahead of the pack on robustness and reliability for your R code.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.