Site icon R-bloggers

validate version 1.5 is out

[This article was first published on R – Mark van der Loo, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A new version of the validate package for data validation was just accepted on CRAN and will be available on all mirrors in a few days.

The most important addition is that you can now reference the data set as a whole, using the “dot” syntax like so:

iris %>% check_that(
    nrow(.)>100
  , "Sepal.Width" %in% names(.)) %>% 
summary()

  rule items passes fails nNA error warning                  expression
1   V1     1      1     0   0 FALSE   FALSE               nrow(.) > 100
2   V2     1      1     0   0 FALSE   FALSE "Sepal.Width" %in% names(.)

Also, it is now possible to return a logical, even when the result is NA, by passing the na.value option.

dat = data.frame(x=c(1,NA,-1))
v = validator(x > 0)
values(confront(dat,v))
        V1
[1,]  TRUE
[2,]    NA
[3,] FALSE
values(confront(dat,v,na.value=FALSE))
        V1
[1,]  TRUE
[2,] FALSE
[3,] FALSE

A complete list of changes and bugfixes can be found in the NEWS file. Below I include changes in 1.4 since I did not write about it before.

I will be talking about this package at the upcoming useR!2016 event, so join me if you’re interested!

version 0.1.5
– The ‘.’ is now used to reference the validated data set as whole.
– Small change in output of ‘compare’ to match the table in van den Broek et al. (2013)

version 0.1.4
– ‘confront’ now emits a warining when variable name conflicts with name of a reference data set
– Deprecated ‘validate_reset’, in favour of the shorter ‘reset’ (use ‘validate::reset’ in case of ambiguity)
– Deprecated ‘validate_options’ in favour of the shorter ‘voptions’
– New option na.value with default value NA, controlling the output when a rule evaluates to NA.
– Added rules from the ESSnet on validation (deliverable 17) to automated tests.
– added ‘grepl’ to allowed validation syntax (suggested by Dusan Sovic)
– exported a few functions w/ keywords internal for extensibility
– Bugfix: blocks sometimes reported wrong nr of blocks (in case of a single connected block.)
– Bugfix: macro expansion failed when macros were reused in other macros.
– Bugfix: certain nonlinear relations were recognized as linear
– Bugfix: rules that use (anonymous) function definitions raised error when printed.

To leave a comment for the author, please follow the link and comment on their blog: R – Mark van der Loo.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.