Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
lumberjack
is a package that allows you to track (log) changes in data while an R script is running. This allows you to detect exactly which code had what effect on your data.
The only thing you have to do is
- add one line of code at the top of your script, for example:
start_log(mydata , expression_logger$new(m = mean(salary, na.rm=TRUE))
to follow the mean of variable salary
in mydata
as mydata
gets processed by one line of code at the time.
- Run your script with
library(lumberjack) run_file("myscript.R")
and your script will run as usual, except that the mean salary
is tracked across the run, and (in this case) automatically written to a file. The package is extensible so you can use one of the built-in loggers, or write your own.
If you want to know more, I highly recommend reading through my short paper on lumberjack
that was recently accepted by the Journal of Statistical Software.
New release
Version 1.2.0 was accepted by CRAN on 8 may 2020. There are a few new features and fixes. Some of them suggested by one of the JSS
reviewers.
- The most important change is that loggers now not only know what expression is running, but also from which file it originated, and from which line. This means that loggers can give even more informative information on when a script did what to data.
- The JSS paper is now included as a vignette.
Finally
Here’s a picture of a man, trying hard to look like a lumberjack while staring into the void.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.