Site icon R-bloggers

R Inferno-ism: order is not rank

[This article was first published on Portfolio Probe » R language, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Do not use order when you want rank.

Background

The update of “A comparison of some heuristic optimization methods” is due to the bug that Luca Scrucca spotted.

Actually, it is two bugs:

 

Problem

What I said in my code was (essentially):

ord <- order(x)

Now what I wanted was the order of the values in x.  What I got was the permutation of indices that would put x into sorted order.  Only under the rarest of circumstances are these the same.  But they sound oh so similar.

What I really wanted to say was:

ord <- rank(x, ties.method="first")

(But see below.)

Timing

Using order in this case doesn’t get us where we want to go.  The advantage is that it gets us there really fast.  The rank function is much slower. (Timings in R version 2.15.0.)

  > x10 <- runif(10)
> system.time(for(i in 1:1e4) order(x10))
   user  system elapsed 
   0.11    0.00    0.11 
> system.time(for(i in 1:1e4) rank(x10, ties.method="first"))
   user  system elapsed 
   1.22    0.00    1.34 
> x100 <- runif(100)
> system.time(for(i in 1:1e4) order(x100))
   user  system elapsed 
   0.14    0.00    0.17 
> system.time(for(i in 1:1e4) rank(x100, ties.method="first"))
   user  system elapsed 
   1.61    0.00    1.64 
> x1000 <- runif(1000)
> system.time(for(i in 1:1e4) order(x1000))
   user  system elapsed 
   1.14    0.02    1.15 
> system.time(for(i in 1:1e4) rank(x1000, ties.method="first"))
   user  system elapsed 
   3.76    0.00    3.82

rank is clearly slower than order. The whole point, though, is that these two commands give us different things.  The command order(order(x)) is another way to get what our rank command gives us.  Even though it is a bit kludgy, it can be significantly faster:

> system.time(for(i in 1:1e4) rank(x10, ties.method="first"))
   user  system elapsed 
   1.39    0.00    1.39 
> system.time(for(i in 1:1e4) order(order(x10)))
   user  system elapsed 
   0.23    0.00    0.24 
> system.time(for(i in 1:1e4) rank(x100, ties.method="first"))
   user  system elapsed 
   1.56    0.00    1.56 
> system.time(for(i in 1:1e4) order(order(x100)))
   user  system elapsed 
   0.36    0.00    0.38 
> system.time(for(i in 1:1e4) rank(x1000, ties.method="first"))
   user  system elapsed 
   3.94    0.00    4.00 
> system.time(for(i in 1:1e4) order(order(x1000)))
   user  system elapsed 
   2.17    0.00    2.17 
> x10000 <- runif(10000)
> system.time(for(i in 1:1e4) rank(x10000, ties.method="first"))
   user  system elapsed 
  34.88    0.00   35.01
> system.time(for(i in 1:1e4) order(order(x10000)))
   user  system elapsed 
  29.51    0.00   29.94

 

To leave a comment for the author, please follow the link and comment on their blog: Portfolio Probe » R language.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.