Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This material was presented to a meeting of KIND (Knowledge and Information Network) in April this year.
checks
- What assumptions are you making about your data? (structure, names, types etc.)
- function arguments
- what users will and won’t do
tests
Describe what you expect your functions to do, and how they should behave with regards to user inputs
Checks : assertions
Tests : expectations
Let’s write a simple function that prints the name of a council area:
choose_council <- function(x){ out <- paste("chosen council is", x) out }
Now let’s try it out
choose_council("Highland") [1] "chosen council is Highland" choose_council("Argyll and Bute") [1] "chosen council is Argyll and Bute" choose_council("Bob") [1] "chosen council is Bob" choose_council(1) [1] "chosen council is 1" choose_council("Argyll & Bute") [1] "chosen council is Argyll & Bute"
We can see the function works, but…
Base R functions
From the help:
match.arg
matches a character arg against a table of candidate values
as specified by choices.
To put that more simply, to use the function, we need to pass an argument, and a vector of possible choices. The function will then check that argument against the choices to see if there is a match.
Let’s assume we only want to print Highland
and Argyll and Bute
How can we use match.arg
?
choose_council <- function(council = c("Highland", "Argyll and Bute")){ council <- match.arg(council) out <- paste("chosen council is", council) return(out) } choose_council("Highland") [1] "chosen council is Highland" choose_council("Argyll and Bute") [1] "chosen council is Argyll and Bute" choose_council("Bob") Error in match.arg(council): 'arg' should be one of "Highland", "Argyll and Bute" choose_council(1) Error in match.arg(council): 'arg' must be NULL or a character vector choose_council("Argyll & Bute") Error in match.arg(council): 'arg' should be one of "Highland", "Argyll and Bute"
if no value supplied, match.arg
uses the first element
choose_council() # match.arg uses default arguments [1] "chosen council is Highland"
Partial matching is also possible – you can be lazy and only type the first few letters of your argument.
This is OK for this very simple example, but not for real-life code – certainly not any code where you care about the results.
(As an aside, if you regularly use T
or F
instead of TRUE
and FALSE
– you need to sort your life out)
This works, but .. careful now!
choose_council("A") # partial matching - can be risky [1] "chosen council is Argyll and Bute"
stopifnot
We saw that our function didn’t work when we supplied a number.
choose_council(1)
In this case, match.arg
has it’s own checks in the background. But we
can provide our own. We want to stop the function if a non character
argument is provided.
We use stopifnot
to trigger immediately if a non character argument is
passed.
If a character argument is passed, we use the choices
argument of
match.arg
to validate that this is an acceptable value
choose_council <- function(council){ stopifnot(is.character(council)) council <- match.arg(council, choices = c("Highland", "Argyll and Bute")) out <- paste("chosen council is", council) return(out) } choose_council(1) Error in choose_council(1): is.character(council) is not TRUE choose_council("Argyll & Bute") Error in match.arg(council, choices = c("Highland", "Argyll and Bute")): 'arg' should be one of "Highland", "Argyll and Bute"
Yikes.
We can add friendlier messages
choose_council <- function(council){ stopifnot("council must be character" = is.character(council)) council <- match.arg(council, choices = c("Highland", "Argyll and Bute")) out <- paste("chosen council is", council) return(out) }
Partial matching works as before
choose_council("High") [1] "chosen council is Highland"
But now we get a slightly more readable error message
choose_council(1) Error in choose_council(1): council must be character
chi_check()
See phsmethods
What is a CHI number? The Community Health Index number is used in Scotland to uniquely identify patients.
What needs to be checked?
- Does it contain no non-numeric characters?
- Is it ten digits in length?
- Do the first six digits denote a valid date?
- Is the checksum digit correct?
We can deal with the first three quite quickly with the {checkmate} package
checkmate
“Virtually every standard type of user error when passing arguments into function can be caught with a simple, readable line which produces an informative error message.
A substantial part of the package was written in C to minimize any worries about execution time overhead.”
example CHI
x <- "0101011237"
is this a character vector?
check_class(x, "character") checkClass(x, "character") [1] TRUE [1] TRUE
check_class
and checkClass
are exactly the same, simply choose
whether you prefer snake_case
or camelCase
Functions beginning with check
return either TRUE
, (as above) or,
the error message
check_class(x, "integer") [1] "Must inherit from class 'integer', but has class 'character'"
Functions beginning with assert
either return an error message, or the
checked object is returned invisibly:
assert_class(x, "integer") Error in eval(expr, envir, enclos): Assertion on 'x' failed: Must inherit from class 'integer', but has class 'character'. assert_class(x, "character")
Going back to the CHI example, we can use check_character
for a more
fine grained series of checks
check_character(x, n.chars = 10, pattern = "\\d{10}") # 10 chars, numeric only [1] TRUE x2 <- "010101123A" x3 <- c(x, x2, NA) x4 <- c(x, NA) check_character(x2, n.chars = 10, pattern = "[^A-Z]{10}") check_character(x, n.chars = 10, pattern = "[^A-Z]{10}") [1] "Must comply to pattern '[^A-Z]{10}'" [1] TRUE # final version check_character(x, min.len = 1, n.chars = 10, any.missing = FALSE, pattern = "\\d{10}") vals <- c(x, x2, x3, x4) cat(vals) purrr::map_chr(vals, check_character, min.len = 1, n.chars = 10, any.missing = FALSE, pattern = "\\d{10}") vals <- c(x, x2, x3, x4) cat(vals) 0101011237 010101123A 0101011237 010101123A NA 0101011237 NA purrr::map_chr(vals, check_character, min.len = 1, n.chars = 10, any.missing = FALSE, pattern = "\\d{10}") [1] "TRUE" "Must comply to pattern '\\d{10}'" [3] "TRUE" "Must comply to pattern '\\d{10}'" [5] "Contains missing values (element 1)" "TRUE" [7] "Contains missing values (element 1)" # are first 6 elements a Date? date_val <- substr(x,1,6) cat(date_val) checkDate(as.Date(strptime(date_val,"%d%m%y", "UTC")), lower = "1900-01-01", upper = Sys.Date(), any.missing = FALSE, min.len = 1L) 010101 [1] TRUE
combine checks with the assert
function
main_check <- function(x){ assert(check_character(x, min.len = 1, n.chars = 10, any.missing = FALSE, pattern = "\\d{10}"), checkDate(as.Date(strptime(substr(x,1,6),"%d%m%y", "UTC")), lower = "1900-01-01", upper = Sys.Date(), any.missing = FALSE, min.len = 1L), combine = "and") } out <- main_check(x) out [1] TRUE
for the lazy
qassert
built in data typesqassertr
lists and data frames
qassert(x,"S+[10,11)") # character, vector length 1, lower bound 10 and less than 11 qassert(x,"S+[10,10]") # also works, between 10 and 10 (inclusive) # note difference in closing brackets # character denoted by `s` # no missing values denoted by UPPER CASE # exact length of string 10 denoted by [10]
testing
we can use {tinytest}
for some checks also
tinytest::expect_inherits(x, "character") ----- PASSED : <--> call| tinytest::expect_inherits(x, "character")
Normally we’d list some expectations Here’s a useless function that adds 2 to a given numerical value
add_two <- function(x) { if (is.character(x)) { stop("You've passed a character vector.\nGonnae no' dae that? \nIt should be an integer or double") } checkmate::assert_count(x) checkmate::assert_integerish(x) !checkmate::anyMissing(x) x <- x + 2 message("ya wee beauty!") return(x) } using("checkmate") # test add_two works expect_equal(1 + 2, add_two(1)) ya wee beauty! ----- PASSED : <--> call| expect_equal(1 + 2, add_two(1)) add_two("one") Error in add_two("one"): You've passed a character vector. Gonnae no' dae that? It should be an integer or double expect_error(add_two("one")) ----- PASSED : <--> call| expect_error(add_two("one"))
See also
- defensive programming, covered in this excellent text by Gillespie and Lovelace
- purrr (use of
possibly
/safely
)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.