Improving data quality with deducorrect
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The deducorrect package implements methods to solve common errors in numerical data records. To detect errors, you first have to define the rules which your data has to obey. For example, suppose you have a data.frame with three columns: profit, turnover, and cost, subjected to the rules that all values must be positive, the balance account profit + loss = turnover must hold and the profit-to-turnover ration may not
exceed 0.6 (some kind of sanity check). The rules can be defined as follows:
E <- editmatrix(c( "cost > 0", "profit > 0", "turnover > 0", "cost + profit == turnover", "0.6*turnover >= profit") )
Now let’s look at some simple data.
dat <- data.frame( cost = c(-100, 325, 326 ), profit = c( 150, 457, 475 ), turnover = c( 250, 800, 800 ) )
> (dat <- correctTypos(E,dat)$corrected) cost profit turnover 1 100 150 250 2 325 475 800 3 326 475 800
> (dat <- correctTypos(E,dat)$corrected)