R-caching (and scheduling)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This is in preparation for running a custom Shiny server. We want to accelerate the server by using caching. In the this post we take a look at a candidate caching package.
In this post we’ll explore a the package DataCache. It is a very useful package, however, I found that for some reason the provided weather data example was not working. So I wanted to simulate a datafeed using a scheduler, preferably within R. There is a scheduler package for R tcltk2. It worked for me from the command line, however when running this in RStudio or Rscript there is a small complication, which we will cover further below.
The data function here outputs the system time, using Sys.time(). When it is cached it uses a previous version of the cached time, therefore it is is smaller than the current Sys.time(). In general
Current time >=Cached time .
Let’s look at the output first:
No cached data found. Loading intial data... [1] "Current:2017-12-22 14:58:09.2|Cached:2017-12-22 14:58:09.2" [1] "Current:2017-12-22 14:58:09.5|Cached:2017-12-22 14:58:09.2" [1] "Current:2017-12-22 14:58:09.7|Cached:2017-12-22 14:58:09.2" [1] "Current:2017-12-22 14:58:09.9|Cached:2017-12-22 14:58:09.2" [1] "Current:2017-12-22 14:58:10.1|Cached:2017-12-22 14:58:09.2" Loading more recent data, returning lastest available. [1] "Current:2017-12-22 14:58:10.3|Cached:2017-12-22 14:58:09.2" [1] "Current:2017-12-22 14:58:10.5|Cached:2017-12-22 14:58:10.3" [1] "Current:2017-12-22 14:58:10.7|Cached:2017-12-22 14:58:10.3" [1] "Current:2017-12-22 14:58:10.9|Cached:2017-12-22 14:58:10.3" [1] "Current:2017-12-22 14:58:11.1|Cached:2017-12-22 14:58:10.3" Loading more recent data, returning lastest available. [1] "Current:2017-12-22 14:58:11.3|Cached:2017-12-22 14:58:10.3" [1] "Current:2017-12-22 14:58:11.5|Cached:2017-12-22 14:58:11.3" [1] "Current:2017-12-22 14:58:11.7|Cached:2017-12-22 14:58:11.3" [1] "Current:2017-12-22 14:58:11.9|Cached:2017-12-22 14:58:11.3" [1] "Current:2017-12-22 14:58:12.1|Cached:2017-12-22 14:58:11.3" Loading more recent data, returning lastest available. [1] "Current:2017-12-22 14:58:12.4|Cached:2017-12-22 14:58:11.3" [1] "Current:2017-12-22 14:58:12.6|Cached:2017-12-22 14:58:12.4" [1] "Current:2017-12-22 14:58:12.8|Cached:2017-12-22 14:58:12.4" [1] "Current:2017-12-22 14:58:13.0|Cached:2017-12-22 14:58:12.4" [1] "Current:2017-12-22 14:58:13.2|Cached:2017-12-22 14:58:12.4"
So we can see that it works. Basically, the scheduler does a cycle ~ every 200ms, whereas the Cached time is only updated every second, which implies that the update happens after 5 = 1000ms/200ms cycles.
Let’s discuss the code. We have three parts:
- Preparations: Loading packages, setting options:
#!/usr/bin/env Rscript # load packages library(DataCache) # the the caching library(tcltk2) # for the scheduler # set the resolution to printed time values # so instead of 2017-12-22 14:58:12 we now have 2017-12-22 14:58:12.4 op <- options(digits.secs = 1)
- Define the functions for caching: the datafeed and custom frequency function :
# define getTime function: datafeed_getTime = function(varName) { timeValue = Sys.time() out = list(timeValue) names(out) = paste0('Mycached.' , varName) return (out) } # define custom frequency for cache updates # nMinutes already exists in the package DataCache, but we want faster updates for this test customFrequency_nSeconds <- function(seconds) { fun <- function(timestamp) { return(difftime(Sys.time(), timestamp, units='secs') > seconds) } return(fun) } varName1 = 'test1' # remark : the cached variable for this varName is Mycached.test1
- Define the scheduler:
tclTaskDelete(NULL) # delete all running tasks tclTaskSchedule(200, { cache.timedata1 = data.cache(function() datafeed_getTime(varName1) , cache.name = varName1, frequency = customFrequency_nSeconds(1)) print(paste0('Current:', Sys.time(), '|Cached:', Mycached.test1)) } , id = "ticktock_test1", redo = 20)
The final part is only necessary when not running the code in the R command line i.e., when using it in Rstudio or Rscript. This is necessary for the scheduler to work. There are other ways to define schedulers, which are more robust, but less readable than the tclTaskSchedule, therefore for simplicity’s sake I chose tclTaskSchedule for this post.
# Start : special # This part is only necessary for the scheduler to run with Rscript or RStudio. In R command line it is not necessary # function for runFor = function(totalRunningTime) { startTime <- Sys.time() repeat{ if (Sys.time() - startTime > totalRunningTime) { break } } } runFor(totalRunningTime = 7) # totalRunningTime is in seconds # End : Special options(op)
Final comment: The main hurdle to understanding the way DataCache works are these two points:
- data.cache expects a function. If we want more than one cache we can can e.g. distinguish these by using a variable name varName1, and wrap the datafeed_getTime(varName1) call in a anonymous function
cache.timedata1 = data.cache(function() datafeed_getTime(varName1) , cache.name = varName1, frequency = customFrequency_nSeconds(1))
That variable name is then used in datafeed_getTime to define under which name the value is saved, this is done here:
names(out) = paste0('Mycached.' , varName)
This means because we define varName1 = ‘test1’ that the cached variable for this varName is Mycached.test1
So here is the entire code (for easy copy and pasting):
#!/usr/bin/env Rscript library(DataCache) # the the caching library(tcltk2) # for the scheduler # set the resolution to printed time values # so instead of 2017-12-22 14:58:12 we now have 2017-12-22 14:58:12.4 op <- options(digits.secs = 1) # define getTime function: datafeed_getTime = function(varName) { timeValue = Sys.time() out = list(timeValue) names(out) = paste0('Mycached.' , varName) return (out) } # define custom frequency for cache updates # nMinutes already exists in the package DataCache, but we want faster updates for this test customFrequency_nSeconds <- function(seconds) { fun <- function(timestamp) { return(difftime(Sys.time(), timestamp, units='secs') > seconds) } return(fun) } varName1 = 'test1' # remark : the cached variable for this varName is Mycached.test1 tclTaskDelete(NULL) # delete all running tasks tclTaskSchedule(200, { cache.timedata1 = data.cache(function() datafeed_getTime(varName1) , cache.name = varName1, frequency = customFrequency_nSeconds(1)) print(paste0('Current:', Sys.time(), '|Cached:', Mycached.test1)) } , id = "ticktock_test1", redo = 20) # Start : special # This part is only necessary for the scheduler to run with Rscript or RStudio. In R command line it is not necessary # function for runFor = function(totalRunningTime) { startTime <- Sys.time() repeat{ if (Sys.time() - startTime > totalRunningTime) { break } } } runFor(totalRunningTime = 7) # totalRunningTime is in seconds # End : Special options(op)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.