Site icon R-bloggers

checks and {tiny}testing – a quick primer

[This article was first published on Data By John, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This material was presented to a meeting of KIND (Knowledge and Information Network) in April this year.

checks

tests

Describe what you expect your functions to do, and how they should behave with regards to user inputs

Checks : assertions
Tests : expectations

Let’s write a simple function that prints the name of a council area:

choose_council <- function(x){
  out <- paste("chosen council is", x)
  out
}

Now let’s try it out

choose_council("Highland")

[1] "chosen council is Highland"

choose_council("Argyll and Bute")

[1] "chosen council is Argyll and Bute"

choose_council("Bob")

[1] "chosen council is Bob"

choose_council(1)

[1] "chosen council is 1"

choose_council("Argyll & Bute")

[1] "chosen council is Argyll & Bute"

We can see the function works, but…

Base R functions

From the help:

match.arg matches a character arg against a table of candidate values as specified by choices.

To put that more simply, to use the function, we need to pass an argument, and a vector of possible choices. The function will then check that argument against the choices to see if there is a match.

Let’s assume we only want to print Highland and Argyll and Bute

How can we use match.arg?

choose_council <- function(council = c("Highland", 
                                       "Argyll and Bute")){

  council <-  match.arg(council)
  out <- paste("chosen council is", council)
  return(out)
}

choose_council("Highland")

[1] "chosen council is Highland"

choose_council("Argyll and Bute")

[1] "chosen council is Argyll and Bute"

choose_council("Bob")

Error in match.arg(council): 'arg' should be one of "Highland", "Argyll and Bute"

choose_council(1)

Error in match.arg(council): 'arg' must be NULL or a character vector

choose_council("Argyll & Bute")

Error in match.arg(council): 'arg' should be one of "Highland", "Argyll and Bute"

if no value supplied, match.arg uses the first element

choose_council() # match.arg uses default arguments

[1] "chosen council is Highland"

Partial matching is also possible – you can be lazy and only type the first few letters of your argument. This is OK for this very simple example, but not for real-life code – certainly not any code where you care about the results. (As an aside, if you regularly use T or F instead of TRUE and FALSE – you need to sort your life out)

This works, but .. careful now!

choose_council("A") # partial matching - can be risky

[1] "chosen council is Argyll and Bute"

stopifnot

We saw that our function didn’t work when we supplied a number.

choose_council(1)

In this case, match.arg has it’s own checks in the background. But we can provide our own. We want to stop the function if a non character argument is provided.

We use stopifnot to trigger immediately if a non character argument is passed.

If a character argument is passed, we use the choices argument of match.arg to validate that this is an acceptable value

choose_council <- function(council){

  stopifnot(is.character(council))
  
   council <-  match.arg(council, 
                         choices = c("Highland", 
                                     "Argyll and Bute"))
  
  out <- paste("chosen council is", council)
  return(out)
}

choose_council(1)

Error in choose_council(1): is.character(council) is not TRUE

choose_council("Argyll & Bute")

Error in match.arg(council, choices = c("Highland", "Argyll and Bute")): 'arg' should be one of "Highland", "Argyll and Bute"

Yikes.

We can add friendlier messages

choose_council <- function(council){

stopifnot("council must be character" = is.character(council))
  
   council <-  match.arg(council, 
                         choices = c("Highland", 
                                     "Argyll and Bute"))
  
  out <- paste("chosen council is", council)
  return(out)
}

Partial matching works as before

choose_council("High")

[1] "chosen council is Highland"

But now we get a slightly more readable error message

choose_council(1)

Error in choose_council(1): council must be character

chi_check()

See phsmethods

What is a CHI number? The Community Health Index number is used in Scotland to uniquely identify patients.

What needs to be checked?

We can deal with the first three quite quickly with the {checkmate} package

checkmate

“Virtually every standard type of user error when passing arguments into function can be caught with a simple, readable line which produces an informative error message.

A substantial part of the package was written in C to minimize any worries about execution time overhead.”

example CHI

x <- "0101011237"

is this a character vector?

check_class(x, "character")
checkClass(x, "character")

[1] TRUE

[1] TRUE

check_class and checkClass are exactly the same, simply choose whether you prefer snake_case or camelCase

Functions beginning with check return either TRUE, (as above) or, the error message

check_class(x, "integer")

[1] "Must inherit from class 'integer', but has class 'character'"

Functions beginning with assert either return an error message, or the checked object is returned invisibly:

assert_class(x, "integer")

Error in eval(expr, envir, enclos): Assertion on 'x' failed: Must inherit from class 'integer', but has class 'character'.

assert_class(x, "character")

Going back to the CHI example, we can use check_character for a more fine grained series of checks

check_character(x, n.chars = 10, pattern = "\\d{10}") # 10 chars, numeric only

[1] TRUE

x2 <- "010101123A"
x3 <- c(x, x2, NA)
x4 <- c(x, NA)

check_character(x2, n.chars = 10, pattern = "[^A-Z]{10}")
check_character(x, n.chars = 10, pattern = "[^A-Z]{10}")

[1] "Must comply to pattern '[^A-Z]{10}'"

[1] TRUE

# final version
check_character(x,
                min.len = 1,
                n.chars = 10,
                any.missing = FALSE,
                pattern = "\\d{10}")

vals <- c(x, x2, x3, x4)
cat(vals)
purrr::map_chr(vals,
               check_character,
               min.len = 1,
               n.chars = 10,
               any.missing = FALSE,
               pattern = "\\d{10}")

vals <- c(x, x2, x3, x4)
cat(vals)

0101011237 010101123A 0101011237 010101123A NA 0101011237 NA

purrr::map_chr(vals,
               check_character,
               min.len = 1,
               n.chars = 10,
               any.missing = FALSE,
               pattern = "\\d{10}")

[1] "TRUE"                                "Must comply to pattern '\\d{10}'"   
[3] "TRUE"                                "Must comply to pattern '\\d{10}'"   
[5] "Contains missing values (element 1)" "TRUE"                               
[7] "Contains missing values (element 1)"

# are first 6 elements a Date?
date_val <- substr(x,1,6)

cat(date_val)

checkDate(as.Date(strptime(date_val,"%d%m%y", "UTC")),
          lower = "1900-01-01",
          upper =  Sys.Date(),
          any.missing = FALSE,
          min.len = 1L)

010101

[1] TRUE

combine checks with the assert function

main_check <- function(x){
  assert(check_character(x,
                         min.len = 1,
                         n.chars = 10,
                         any.missing = FALSE,
                         pattern = "\\d{10}"),
         checkDate(as.Date(strptime(substr(x,1,6),"%d%m%y", "UTC")),
                   lower = "1900-01-01",
                   upper =  Sys.Date(),
                   any.missing = FALSE,
                   min.len = 1L),
         combine = "and")
}

out <- main_check(x)
out

[1] TRUE

for the lazy

qassert(x,"S+[10,11)") # character, vector length 1, lower bound 10 and less than 11
qassert(x,"S+[10,10]") # also works, between 10 and 10 (inclusive)
# note difference in closing brackets
# character denoted by `s`
# no missing values denoted by UPPER CASE
# exact length of string 10 denoted by [10]

testing

we can use {tinytest} for some checks also

tinytest::expect_inherits(x, "character")

----- PASSED      : <-->
 call| tinytest::expect_inherits(x, "character") 

Normally we’d list some expectations Here’s a useless function that adds 2 to a given numerical value

add_two <- function(x) {

  if (is.character(x)) {
  stop("You've passed a character vector.\nGonnae no' dae that? \nIt should be an integer or double")
  }

  checkmate::assert_count(x)
  checkmate::assert_integerish(x)
  !checkmate::anyMissing(x)

  x <- x + 2
  message("ya wee beauty!")
  return(x)

}

using("checkmate")
# test add_two works

expect_equal(1 + 2, add_two(1))

ya wee beauty!

----- PASSED      : <-->
 call| expect_equal(1 + 2, add_two(1)) 

add_two("one")

Error in add_two("one"): You've passed a character vector.
Gonnae no' dae that? 
It should be an integer or double

expect_error(add_two("one"))

----- PASSED      : <-->
 call| expect_error(add_two("one")) 

See also

To leave a comment for the author, please follow the link and comment on their blog: Data By John.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version