How much faster is calibration with parallel processing and/or R byte-code compiling ?
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The Svensson model is a 6-factor yield curve model that has been derived from the Nelson-Siegel model in : Svensson, L. E. (1995). Estimating forward interest rates with the extended Nelson & Siegel method. Sveriges Riksbank Quarterly Review, 3(1) :13-26.
In this post, I compare the performances of R package mcGlobaloptim on the model’s calibration to observed (fictitious) rates.
This will be done in pure R, with sequential or parallel processing, with or without byte-code compilation (shortly, a compact numeric code between source code and machine code, in which the source code is ‘translated’; please refer to this article or this one).
Here are the R packages required for this experiment :
# Support for simple parallel computing in R require(snow) # Global optimization using Monte Carlo and Quasi # Monte Carlo simulation require(mcGlobaloptim) # The R Compiler Package require(compiler) # Sub microsecond accurate timing functions require(microbenchmark) # Benchmarking routine for R require(rbenchmark)
The R Code for the objective function (function to be minimized) used here, appears in : Gilli, Manfred and Schumann, Enrico, A Note on ‘Good Starting Values’ in Numerical Optimisation (June 3, 2010). Available at SSRN: http://ssrn.com/abstract=1620083 or http://dx.doi.org/10.2139/ssrn.1620083
### Calibrating the Nelson-Siegel-Svensson model ### : Svensson model NSS2 <- function(betaV, mats) { gam1 <- mats/betaV[5] gam2 <- mats/betaV[6] aux1 <- 1 - exp(-gam1) aux2 <- 1 - exp(-gam2) betaV[1] + betaV[2] * (aux1/gam1) + betaV[3] *(aux1/gam1 + aux1 - 1) + betaV[4] * (aux2/gam2 + aux2 - 1) } # True parameters betaTRUE <- c(5, -2, 5, -5, 1, 3) # Input maturities mats <- c(1, 3, 6, 9, 12, 15, 18, 21, 24, 30, 36, 48, 60 , 72, 84, 96, 108, 120)/12 # Corresponding yield to maturities yM <- NSS2(betaTRUE, mats) dataList <- list(yM = yM, mats = mats, model = NSS2) # Bounds for parameters' search settings <- list(min = c(0, -15, -30, -30, 0, 3), max = c(15, 30, 30, 30, 3, 6), d = 6) # Objective function OF <- function(betaV, dataList) { mats <- dataList$mats yM <- dataList$yM model <- dataList$model y <- model(betaV, mats) aux <- y - yM crossprod(aux) } #
A compiled version of the objective function is then defined, using the package compiler
from L. Tierney (now a part of base R) :
# define compiled objective function OFcmp <- cmpfun(OF) OFcmp ## function(betaV, dataList) { ## mats <- dataList$mats ## yM <- dataList$yM ## model <- dataList$model ## y <- model(betaV, mats) ## aux <- y - yM ## crossprod(aux) ## } ## bytecode: 0x000000000dfa98b8
The 4 situations being tested are the following :
* Optimization with sequential processing and no byte-code compilation
* Optimization with sequential processing and byte-code compilation
* Optimization with parallel processing and no byte-code compilation
* Optimization with parallel processing and byte-code compilation
Thus, the performance of multiStartoptim
, based on the 4 following functions will be tested :
OF_classic <- function(.n) { multiStartoptim(objectivefn = OF, data = dataList, lower = settings$min, upper = settings$max, method = "nlminb", nbtrials = .n, typerunif = "sobol") } OF_cmp <- function(.n) { multiStartoptim(objectivefn = OFcmp, data = dataList, lower = settings$min, upper = settings$max, method = "nlminb", nbtrials = .n, typerunif = "sobol") } OF_classic_parallel <- function(.n) { multiStartoptim(objectivefn = OF, data = dataList, lower = settings$min, upper = settings$max, method = "nlminb", nbtrials = .n, typerunif = "sobol", nbclusters = 2) } OF_cmp_parallel <- function(.n) { multiStartoptim(objectivefn = OFcmp, data = dataList, lower = settings$min, upper = settings$max, method = "nlminb", nbtrials = .n, typerunif = "sobol", nbclusters = 2) }
First of all, it's important to notice that parallel processing is not necessarily faster than sequential processing when nbtrials
is low :
nbtrials <- 5 benchmark1 <- benchmark(OF_classic(nbtrials), OF_classic_parallel(nbtrials), columns = c("test", "replications", "elapsed", "relative"), replications = 10, relative = "elapsed") benchmark1 ## test replications elapsed relative ## 1 OF_classic(nbtrials) 10 0.38 1.00 ## 2 OF_classic_parallel(nbtrials) 10 7.39 19.45
Moreover, a little attention should be paid to :
* The choice of the number of clusters in terms of availability
* More clusters doesn't necessarily mean faster calibration.
Now, increasing the number of trials, we verify first that the results obtained in the 4 cases are all the same :
nbtrials <- 500 t1 <- OF_classic(nbtrials) t2 <- OF_cmp(nbtrials) t3 <- OF_classic_parallel(nbtrials) t4 <- OF_cmp_parallel(nbtrials)
We have :
t1$par ## [1] 5 -2 5 -5 1 3 t1$objective ## [1] 2.795e-25
and :
all.equal(t1$objective, t2$objective, t3$objective, t4$objective) ## [1] TRUE
And now, the final comparison of the 4 cases :
benchmark2 <- microbenchmark(OF_classic(nbtrials), OF_cmp(nbtrials), OF_classic_parallel(nbtrials), OF_cmp_parallel(nbtrials), times = 100) print(benchmark2, unit = "ms", order = "median") ## Unit: milliseconds ## expr min lq median uq max ## OF_cmp_parallel 2697 2753 2787 2883 3194 ## OF_classic_parallel 2849 2925 2971 3061 4121 ## OF_cmp 3712 3753 3809 3860 4430 ## OF_classic 4060 4121 4153 4228 4801
Here is how pretty ggplot2
displays it :
require(ggplot2) plt <- ggplot2::qplot(y = time, data = benchmark2, colour = expr, xlab = "execution date", ylab = "execution time in milliseconds") plt <- plt + ggplot2::scale_y_log10() print(plt)
With this being said, a sequential implementation in C code might be much faster than these examples in pure R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.