How to reliably access network resources in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
It’s frustrating when an application unexpectedly dies due to a network timeout or unavailability of a network resource. Veterans of distributed systems know not to rely on network-based resources, such as web services or databases, since they can be unpredictable. So what is a data scientist supposed to do when you must use these resources in her analysis/application?
When there is a true network partition, there’s not much you can do since these resources are inaccessible. Most of the time, though, the issue is a timeout due to network latency or an unresponsive server. In these situations, the problem is temporary. It would be nice to recover from the error without having to add a bunch of logic and muddying up your model code. Recovery can be as simple as trying again, eventually failing if a resource is truly unavailable.
The new function ntry
in lambda.tools
1.0.5 does just this: call a function up to n times, returning the result of the first successful call.
Here’s an example of how it works. The following function simulates an unreliable resource that fails 75% of the time. Using ntry
, the function will be tried over and over until it either succeeds or the limit is reached.
library(lambda.tools) library(futile.logger) fn <- function(i) { x <- sample(1:4, 1) flog.info("x = %s",x) if (x < 4) stop('stop') else x }
Calling the function in isolation will mostly likely fail:
> fn() INFO [2015-01-21 18:26:21] x = 2 Error in fn() : stop
This is similar to what happens with a timeout, where sometimes a function will fail. To get around this, normally a loop of some sort is introduced to try a few times until the call succeeds. With ntry
it’s simply a matter of wrapping a function in a closure and specifying the number of tries.
> ntry(fn, 6) INFO [2015-01-21 18:39:21] x = 2 INFO [2015-01-21 18:39:21] x = 4 [1] 4
Here’s a real-world example using RPostgreSQL. In a single function, a connection is opened, the query executed, and the connection closed.
db_execute_query <- function(query) { on.exit(dbDisconnect(con)) drv <- dbDriver("PostgreSQL") con <- dbConnect(drv, host=HOST, port=PORT, dbname=DATABASE, user=USER, password=PASS) dbGetQuery(con, statement=query) }
For this to work with ntry
, I use the on.exit
function to disconnect. Normally I’d use a tryCatch
block, but since ntry
will catch the error, I leave this code naked. The ntry
wraps the DB call in a closure, where the argument i
is the attempt number. This is useful if you want to debug the call. The second parameter is simply the number of tries.
df <- ntry(function(i) db_execute_query(query), 3)
Access to the database is now a bit more resilient. To try it out yourself, install the latest version of lambda.tools
via devtools.
library(devtools) install_github('lambda.tools','zatonovo')
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.