Python style logging in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We are increasingly using R in “operational” settings that require robust error handling and logging. In this post we describe a quick-and-dirty way to get python style multi-level log files by wrapping the futile.logger package.
Making Messy Data Pretty
Our real world scenario involves R scripts that process raw smoke monitoring data that is updated hourly. The raw data comes from various different instruments, set up by different agencies and transmitted over at least two satellites before eventually arriving on our computers.
Data can be missing, delayed or corrupted for a variety of reasons before it gets to us. And then our R scripts perform QC based on various columns available in the raw (aka “engineering level”) data.
Ultimately, the cleaned up data gets displayed in a user interface here.
No one needs to know all of the hard work that goes into cleaning and QC’ing the data that goes into this map. Our goal, after all, is to make it easier for stakeholders to make decisions without having to worry about the fussy details of raw data.
Logging When Things Go Wrong
However, neither the incoming data nor the R scripts we have written to process it are perfect. Things occasionally go wrong in a variety of ways and, when decision makers depend on the data being available, phone calls and text messages are next.
Bhuddist teacher Pema Chödrön’s book When Things Fall Apart has several advice categories that actually apply in this situation. We will focus on one:
- Using painful emotions to cultivate wisdom, compassion, and courage
- Communicating so as to encourage others to open up rather than shut down
- Practices for reversing habitual patterns
- Methods for working with chaotic situations
- Ways for creating effective social action
One of the methods for working with chaotic situations in operational software is to have lots and Lots and LOTS of logging.
Python has the brilliant “logging” module that allows us to quickly set up separate output files for log statements at different levels:
DEBUG, INFO, WARNING, ERROR, WARNING, CRITICAL. Inside our operational code we only need to issue single calls like:
logger.info('Ingesting data file: %s', data_file)
and this information will automatically be written out to both the DEBUG and the INFO log files if we have set them up.
In our situation it makes sense to write out three log files for quick perusal by authorized personnel:
- ~_DEBUG.log — for programmers needing to debug what went wrong
- ~_INFO.log — for managers trying to understand the overall situation
- ~_ERROR.log — (hopefully zero length!) for programmers and managers
These files get written to a uniform directory every time a data processing script runs with the ‘~’ replaced by the name of the processing script.
Wrapping futile.logger
Luckily, most of the functionality we need is provided by futile.logger package (CRAN or github). Unfortunately, this package does not currently support multiple “appenders” per “logger” so we found ourselves initializing multiple named loggers to mimic the natural behavior of Python’s logging module:
# Only FATAL messages go to the console dummy <- flog.threshold(FATAL) # Set up new log files if ( file.exists(DEBUG_LOG) ) dummy <- file.remove(DEBUG_LOG) dummy <- flog.logger("debug", DEBUG, appender.file(DEBUG_LOG)) if ( file.exists(INFO_LOG) ) dummy <- file.remove(INFO_LOG) dummy <- flog.logger("info", INFO, appender.file(INFO_LOG)) if ( file.exists(ERROR_LOG) ) dummy <- file.remove(ERROR_LOG) dummy <- flog.logger("error", ERROR, appender.file(ERROR_LOG)) # Silence other warning messages options(warn=-1) # -1=ignore, 0=save/print, 1=print, 2=error
Then, inside the code, we created logging and error handling blocks like this:
if ( class(result)[1] == "try-error" ) { err_msg <- geterrmessage() flog.error('Error creating SpatialPoitnsDataFrame: %s', err_msg, name='debug') flog.error('Error creating SpatialPointsDataFrame: %s', err_msg, name='info') flog.error('Error creating SpatialPointsDataFrame: %s', err_msg, name='error') stop(err_msg) } ... flog.info('Using rgdal::writeOGR to create geojson file %s', filename, name='debug') flog.info('Using rgdal::writeOGR to create geojson file %s', filename, name='info')
Yuck!
After writing just a few lines like this we tried a different approach and came up with a simple py_logging.R script that can be used to wrap futile.logger calls and give us the python style behavior we want. Setup is a cinch:
source('py_logging.R') # python style logging # Set up logging logger.setup(debugLog, infoLog, errorLog) # Silence other warning messages options(warn=-1) # -1=ignore, 0=save/print, 1=print, 2=error
And logging is much more natural:
if ( class(result)[1] == "try-error" ) { err_msg <- geterrmessage() logger.error('Error creating SpatialPointsDataFrame: %s', err_msg) stop(err_msg) } ... logger.info('Using rgdal::writeOGR to create goejson file %s', filename)
Output from our logger looks great!
INFO [2016-08-24 10:11:54] Running process_data.R version 0.1.5 INFO [2016-08-24 10:11:54] Working on /home/data/location/datafile_1.RData INFO [2016-08-24 10:11:55] Found 1232 data columns INFO [2016-08-24 10:12:03] Working on /home/data/location/datafile_2.RData INFO [2016-08-24 10:12:03] Found 103 data columns INFO [2016-08-24 10:12:03] Working on /home/data/location/datafile_3.RData INFO [2016-08-24 10:12:03] Found 44 data columns
At the end of the day, we can only hope we have succeeded at one or another other of:
- Using painful emotions to cultivate wisdom, compassion, and courage
- Communicating so as to encourage others to open up rather than shut down
Best wishes for excellent logging!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.