Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Missing data hinders statistical analyses. Estimating missing values (imputation) prior to analysis is one way to deal with that. In some cases however, the missings need not be estimated at all, since they can be derived with certainty from other data which is present. The latest version of our package deducorrect can do this for numerical as well as for categorical data.
As an example, consider a record with three fields
If we’re given a record with values
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | > library(deducorrect) Loading required package: editrules > # define the rules > E <- editmatrix(c( + "x + y == z", + "x >= 0", "y>=0", "z>=0" + ) + ) > # some data: > (dat <- data.frame(x=c(1,4),y=c(NA,NA),z=c(4,1))) x y z 1 1 NA 4 2 4 NA 1 > # And now for the magic step: (deduImpute returns a # 'deducorrect' object) > imp <- deduImpute(E,dat) > # the imputed data > imp$corrected x y z 1 1 3 4 2 4 NA 1 # a list of imputations performed > imp$corrections row variable old new 1 1 y NA 3 |
The deduImpute function only imputes what can be imputed consistently, taking all (in)equality rules into account. Some of the lower-level (record-by-record) functionality is exported as well, and as said before, it also works for categorical data.
There’s a lot more to say about deductive imputation. If you’re interested in the mathematical background or want to see more examples, please read our paper which is included as the package vignette. Don’t hesitate to drop us a line with comments, suggestions or if you find a little insect =:O.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.