Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In a recent blog post we discussed caching calls to the web offline, on your own computer. Just like you can cache data on your own computer, a data provider can do the same thing. Most of the data providers we work with do not provide caching. However, at least one does: EOL, or Encyclopedia of Life. EOL allows you to set the amount of time (in seconds) that the call is cached, within which time you can make the same call and get the data back faster. We have a number of functions to interface with EOL in our taxize
package.
Install and load taxize
and ggplot2
.
install.packages(c("taxize", "ggplot2")) library(taxize) library(ggplot2)
To easily visualize the benefit of using EOL's caching, let's define a function to:
- Make a call to the EOL API search service (via the
eol_search
function intaxize
) with caching set to X seconds (which means the cached result will be available for X seconds). This first call caches the query on their servers. Note that in theeol_search
function below, we are using thecache_ttl
parameter to set the number of seconds to cache the request. - The second call is done before X seconds pass, so should be faster as the first one was cached.
- Sleep for a period, a bit longer than the amount of time the call is cached.
- The third call occurs after the cached call should be gone on the EOL servers.
- Plot the times each request took.
testcache <- function(terms, cache){ first <- system.time( eol_search(terms=terms, cache_ttl = cache) ) second <- system.time( eol_search(terms=terms, cache_ttl = cache) ) Sys.sleep(cache+2) third <- system.time( eol_search(terms=terms, cache_ttl = cache) ) df <- data.frame(labs=c('nocache','withcache','cachetimedout'), vals=c(first[[3]], second[[3]], third[[3]])) df$labs <- factor(df$labs, levels = c('nocache','withcache','cachetimedout')) ggplot(df, aes(labs, vals)) + geom_bar(stat='identity') + theme_grey(base_size = 20) + ggtitle(sprintf("search term: '%s'\n", terms)) + labs(y='Time to get data\n', x='') }
Search for the term lion
testcache(terms = "lion", cache = 5)
Search for the term beetle
testcache(terms = "beetle", cache = 10)
Caching works the same way with the eol_pages
function. No other API services and associated functions in taxize
support caching on the server side by the data provider. Of course you can do your own caching using knitr
or other methods – some of which we discussed in an earlier post.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.