Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
On my former blog, I wrote a post about the Cantor expansion of a natural integer. This is a generalization of the well-known binary expansion. For example, the Cantor \((3,4,5)\)-expansion of a natural integer \(N\) is a triplet \[ (\epsilon_0, \epsilon_1, \epsilon_2) \in \{0,1,2\} \times \{0,1,2,3\} \times \{0,1,2,3,4\} \] such that \[ N = \epsilon_0 + \epsilon_1 \times 3 + \epsilon_2 \times (3\times 4). \] It exists for every natural number \(N\) lower than or equal to \(3 \times 4 \times 5 – 1 = 59\).
A binary expansion is a Cantor \((2, 2, \ldots, 2)\)-expansion.
In the article of my former blog, I explain how that can be used to transform a nested loop to a single loop.
To get a Cantor expansion, I described the greedy algorithm on my former blog. It is implemented as follows:
Cantor_greedy <- function(n, sizes) { l <- c(1, cumprod(sizes)) epsilon <- numeric(length(sizes)) while(n > 0){ k <- which.min(l <= n) e <- floor(n / l[k-1]) epsilon[k-1] <- e n <- n - e * l[k-1] } epsilon }
Later, on StackOverflow, a user opened a
question
entitled
Efficiently create random sample from expand.grid
output
without using expand.grid
. I immediately replied and suggested to use a Cantor expansion.
But another user, Robert Hacken, replied too, and he gave an answer seemingly different of mine at first glance, but in fact it was the same. Actually this user provided another way to derive a Cantor expansion of an integer:
mod2 <- function(x, y) { (x-1) %% y } Cantor_Hacken <- function(n, sizes) { p <- cumprod(c(1, head(sizes, -1))) mod2(ceiling((n + 1) / p), sizes) }
We can check these two ways give the same result:
Cantor_greedy(n = 29, sizes = c(3, 4, 5)) ## [1] 2 1 2 Cantor_Hacken(n = 29, sizes = c(3, 4, 5)) ## [1] 2 1 2
On StackOverflow, I compared the speed of the two R functions, and Hacken’s one is faster. I also compared the speed of Hacken’s R function and the Rcpp implementation of the greedy algorithm given in my former blog, and this one is faster. It is now time to compare two Rcpp implementations.
library(inline) # greedy src <- 'int n = as<int>(N); std::vector<int> s = as<std::vector<int>>(sizes); std::vector<int> epsilon(s.size()); std::vector<int>::iterator it; it = s.begin(); it = s.insert(it, 1); int G[s.size()]; std::partial_sum(s.begin(), s.end(), G, std::multiplies<int>()); int k; while(n > 0) { k = 1; while(G[k] <= n) { k++; } epsilon[k-1] = (int)n / G[k-1]; n = n % G[k-1]; } return wrap(epsilon);' greedy_Rcpp <- cxxfunction( signature(N = "integer", sizes = "integer"), body = src, plugin = "Rcpp" ) # Hacken src <- 'int n = as<int>(N); std::vector<int> s = as<std::vector<int>>(sizes); int l = s.size(); std::vector<int> epsilon(l); int p = 1; double t = n + 1; for(int i = 0; i < l; i++) { epsilon[i] = ((int)ceil(t / p) - 1) % s[i]; p *= s[i]; } return wrap(epsilon);' Hacken_Rcpp <- cxxfunction( signature(N = "integer", sizes = "integer"), body = src, plugin = "Rcpp" ) # check greedy_Rcpp(N = 29L, sizes = c(3L, 4L, 5L)) ## [1] 2 1 2 Hacken_Rcpp(N = 29L, sizes = c(3L, 4L, 5L)) ## [1] 2 1 2 # benchmark library(microbenchmark) sizes <- 2L:10L greedy <- function() { L <- vector("list", length = prod(sizes)) for(n in seq_len(prod(sizes))) { L[[n]] <- greedy_Rcpp(n, sizes) } } Hacken <- function() { L <- vector("list", length = prod(sizes)) for(n in seq_len(prod(sizes))) { L[[n]] <- Hacken_Rcpp(n, sizes) } } print( microbenchmark( greedy = greedy(), Hacken = Hacken(), times = 20L ), signif = 5 ) ## Unit: seconds ## expr min lq mean median uq max neval ## greedy 7.4789 8.8480 10.022 9.784 10.478 16.557 20 ## Hacken 7.5696 8.8942 10.689 10.011 12.404 15.998 20
Well, the speeds are rather similar. But that could depend on the values
of sizes
; we could perform finer comparisons.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.