Site icon R-bloggers

Bubble sorting in R, C++ and Julia: code improvements and the R compiler

[This article was first published on NumberTheory » R stuff, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the past few months I have written posts about implementing the bubble sort algorithm in different languages. In the mean while I have gotten some feedback and suggestions regarding improvements to the implementation I made, see the end of the post for the new source code of the algorithms. These often had a quite profound effect on performance.

One of the best tips was to use the R compiler, i.e. the compiler package which is now part of the standard R distribution. This works by simply calling cmpfun:

function_compiled = cmpfun(function)

The following table presents the timings in microseconds of the different algorithms:

min         median          max 
                    bubble_sort 1506985.326   1571561.4795  1667417.009 
           bubble_sort_compiled  308573.445    322742.5950   350686.185 
          bubble_sort_recursive 1524428.777   1579060.6400  1692169.993 
 bubble_sort_recursive_compiled 1536280.034   1589214.5010  1660944.670 
                bubble_sort_cpp     791.354       821.7165     1170.818 
                     sort_julia     958.000       1009.000     1092.000 
                           sort     105.771       170.8530      454.844

The following things I find striking:

  1. Compiling the for/while loop based R solution benefits massively form the compiler package, increasing the speed almost 5 fold.
  2. Compiling the recursive R based solution does not yield any improvements.
  3. The C++ solution is obviously much faster than any R based solution, increasing the speed between 1900 and 400 times.
  4. The Julia based solution is almost as fast as the C++ solution, which is very impressive for a high-level interpreted language.
  5. The native R sort is almost 8 times faster than the fastest bubble sort in C++ and Julia, but sort might use a different algorithm.

R (recursive implementation). No improvements.

larger = function(pair) {
   if(pair[1] > pair[2]) return(TRUE) else return(FALSE)
}

swap_if_larger = function(pair) {
    if(larger(pair)) {
        return(rev(pair)) 
    } else {
        return(pair)
    }
}

swap_pass = function(vec) { 
    for(i in seq(1, length(vec)-1)) {
        vec[i:(i+1)] = swap_if_larger(vec[i:(i+1)])
    }
    return(vec)
}

bubble_sort_recursive = function(vec) {
    new_vec = swap_pass(vec)
    if(isTRUE(all.equal(vec, new_vec))) { 
        return(new_vec) 
    } else {
        return(bubble_sort(new_vec))
    }
}

R (for/while loop implementation). This implementation was previously not present.

bubble_sort = function(vec) {
    no_passes = 0
    while(1) {
        no_swaps = 0
        for (j in 1 : (length(vec) - 1 - no_passes)) {
            if (vec[j] > vec[j + 1]) {
                s = vec[j]
                vec[j] = vec[j+1]
                vec[j+1] = s
                no_swaps = no_swaps + 1
            }
        }
        no_passes = no_passes + 1
        if(no_swaps == 0) break
    }
    vec
}

C++ (linked into R using Rcpp/inline). Precomputing vec.size() outside the loop improved performance by a factor of 2.

require(inline)  ## for cxxfunction()

src = 'Rcpp::NumericVector vec = Rcpp::NumericVector(vec_in);
       double tmp = 0;
       int n = vec.size();
       int no_swaps;
       int passes;
       passes = 0;
       while(true) {
           no_swaps = 0;
           for (int i = 0; i < n - 1 - passes; ++i) {
               if(vec[i] > vec[i+1]) {
                   no_swaps++;
                   tmp = vec[i];
                   vec[i] = vec[i+1];
                   vec[i+1] = tmp;
               };
           };
           if(no_swaps == 0) break;
           passes++;
       };
       return(vec);'
bubble_sort_cpp = cxxfunction(signature(vec_in = "numeric"), body=src, plugin="Rcpp")

Julia. Subtly changing the definition of the for loop (1:(length(vec_in) - 1 - passes) vs [1:(length(vec_in) - 1 - passes)]) improved performance two fold.

function swap_vector(vec_in, passes)
    no_swaps = 0
    for vec_index in 1:(length(vec_in) - 1 - passes)
        if vec_in[vec_index] > vec_in[vec_index + 1]
            no_swaps += 1
            tmp = vec_in[vec_index]
            vec_in[vec_index] = vec_in[vec_index + 1]
            vec_in[vec_index + 1] = tmp
        end
    end
    return no_swaps
end

function sort_julia(vec_in)
    passes = 0
    while(true)
        no_swaps = swap_vector(vec_in, passes)
        if no_swaps == 0 
            break
        end
        passes += 1
    end
    return vec_in
end

To leave a comment for the author, please follow the link and comment on their blog: NumberTheory » R stuff.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.