Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
By Andy Nicholls – Head of Consulting, UK
Introduction
Testing is a crucial component in ensuring that the correct analyses are deployed. However it is often considered unglamorous; a poor relation in terms of the time and resources allocated to it in the process of developing a package. But with the increasing popularity and commercial application of R it testing is a subject that is gaining significantly in importance.
At the time of writing there are 5987 packages on CRAN. Due to the nature of CRAN and the motivations of contributors the quality of packages can vary greatly. Some are very popular and well maintained, others are essentially inactive with development having all but ceased. As the number of packages on CRAN continues to grow, determining which packages are fit for purpose in a commercial environment is becomming an increasingly difficult task. There have been numerous articles and blog posts on the subject of CRAN’s growth and the quality of R packages. In particular, Francis Smart’s R-bloggers post entitled Does R have too many packages? highlights five perceived concerns with the growing number of R packages. I would like to expand on one of these themes in particular, namely the “inconsistent quality of individual packages”.
There are many ways in which a package can be assessed for quality. Popularity is clearly one: if lots of people use it then it must be quite good! But popular packages tend to also have authors that actively develop their packages and fix bugs as users identify them. Development activity is therefore another factor; the length of time that a package has existed for; the package dependency tree and the number of reverse ‘Depends’, ‘Imports’ and ‘Suggests’; the number of authors and their reputation; and finally there is testing. Francis briefly mentions testing in his post noting that “testing is still largely left up to the authors and users”. In other words there is no requirement for an author to write tests for their package and often they don’t!
Testing
It is standard practice to test commercial software at both the unit and system level. In other words tests are written for the both the individual components of the software and for the software as a whole. Through Continuous Integration (CI) any change to the source code results in a rebuild of the package and re-run of any unit tests. This is essentially what happens when a package is submitted to CRAN. However there is no requirement for an R package to contain any kind of formal test structure. Below I have written a brief script to count how often R’s 3 unit testing packages,testthat
, RUnit
and svUnit
, are referenced by other packages via the ‘Depends’, ‘Imports’, ‘LinkingTo’ or ‘Suggests’ categories available when building an R package.
# Current packages on CRAN download.file("http://cran.R-project.org/web/packages/packages.rds", "packages.rds", mode="wb") cranPackages <- readRDS("packages.rds") cranPackages <- cranPackages[!duplicated(cranPackages[,1]),] # Unit testing packages unitPackages <- c("testthat", "RUnit", "svUnit") reverseDeps <- tools:::package_dependencies( packages=unitPackages, cranPackages, recursive=FALSE, reverse=TRUE, which=c("Depends","Imports","LinkingTo", "Suggests")) sapply(reverseDeps, length) ## testthat RUnit svUnit ## 352 116 11 |
These numbers equate to around 8% of R packages on CRAN containing any kind of recognised test framework. An author can also implement their own test framework which is not captured in the previous statistic. And during a Q&A session at the inaugural EARL Conference this year a fellow audience member pointed out that you could consider the examples in the help documentation to be tests since they must run successfully to pass an R CMD check
. But I would argue that this only really tests that the code runs, not that it produces some expected output. Overall the level of testing of R packages is very low. Further, having a test framework does not necessarily mean that every line of source code within a package has actually been tested. This is where the testCoverage
package can help.
testCoverage
The idea of test coverage, i.e. the percentage of the source code that is exercised by the test framework, is not new. It has been around pretty much since the beginning of formal software development. However until now it had not been implemented for R.
The basic idea is very simple. First consider the following simple function which forms our ‘source code’:
# sourceFile.R absFun <- function(x){ if( x < 0 ){ -x } else if( x >= 0 ){ x } } |
Now let’s imagine a very simple unit test for the absFun
function that will return TRUE if the test passes and print a useful message otherwise:
# testFile.R result <- try(absFun(-5)) if(result != 5){ message("absFun function failed negative value test") } |
Clearly this standalone unit test does not test what happens when x is zero or positive. In other words it never hits the ‘else’ section of the code. The framework does not therefore ‘cover’ 100% of the source code. Most observers would tend to conclude that it covers 50% of the source code. This coverage concept is the basis of Mango’s testCoverage
package.
The testCoverage
package makes use of the trace
functionality in R. It does so as follows. First of all it reads the source files within a package and replaces symbols within the code with a unique identifier. This is then injected into a tracing function that will report each time the symbol is called/hit by your test framework. The first symbol at each level of the expression tree is traced, allowing the coverage of code branches to be checked.
Consider the the earlier absFun
example. The symbol x
appears 4 times in our source file. It is hit twice by ‘testFile.R’ and therefore the reported coverage is 50% (2/4). Within the testCoverage
package the raw test coverage statistic testCoverage
is presented along with an interactive HTML report in which it is clear to see exactly which lines of the source code have been hit by the test framework and which have not (see below).
To make best use of testCoverage
it is important to understand how the coverage statistic is calculated. If an ‘else’ only had been used in this example (as opposed to an ‘if, else’) then x
would appear only 3 times and would be hit twice by the test framework. The reported coverage in this case would therefore be 67% even though the source code is (arguably) functionally almost identical. What should be clear however is that if you hit all of your trace points then your package will score 100% and if you have no tests it will score 0%.
Road Map
So what is next for testCoverage
? The package has been released publically and currently lives on GitHub. It is by no means complete however and there is a clear roadmap of features that Mango would like to incorporate into the package. For example testCoverage
does not support S4 classes and this is currently a development priority.
It should also be noted that the current implementation masks core functionality in R and there is a start-up message which recommends restarting R after using testCoverage. We are investigating the possibility of running testCoverage to avoid this masking. If it can be avoided then this would be the point at which we would potentially look to release onto CRAN.
One of the key areas that we are looking at is integration with other services such as Continuous Integration (CI) platforms. We would also like to expand the scope beyond the R code to include C code called by R. This would involve inserting trace points into the C code and using a modified .C or .Call functions to hand back profiling data to R as traditional tools such as gcov
do not integrate with the way R loads compiled C functions.
Conclusion
Testing is a vital aspect of software development which is often overlooked by R package authors. testCoverage
provides a tool for assessing the level of test coverage within an R package. As the commercial adoption of R continues to increase, both the importance and the level of testing are also set to increase. Mango’s aim is that testCoverage
becomes one of many tools and metrics that package developers use when building their packages and one that users look to when assessing the quality of an R package. A testCoverage
league table is perhaps not too far away!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.