Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In the past few months I have written posts about implementing the bubble sort algorithm in different languages. In the mean while I have gotten some feedback and suggestions regarding improvements to the implementation I made, see the end of the post for the new source code of the algorithms. These often had a quite profound effect on performance.
One of the best tips was to use the R compiler, i.e. the compiler
package which is now part of the standard R distribution. This works by simply calling cmpfun
:
function_compiled = cmpfun(function)
The following table presents the timings in microseconds of the different algorithms:
min median max bubble_sort 1506985.326 1571561.4795 1667417.009 bubble_sort_compiled 308573.445 322742.5950 350686.185 bubble_sort_recursive 1524428.777 1579060.6400 1692169.993 bubble_sort_recursive_compiled 1536280.034 1589214.5010 1660944.670 bubble_sort_cpp 791.354 821.7165 1170.818 sort_julia 958.000 1009.000 1092.000 sort 105.771 170.8530 454.844
The following things I find striking:
- Compiling the for/while loop based R solution benefits massively form the compiler package, increasing the speed almost 5 fold.
- Compiling the recursive R based solution does not yield any improvements.
- The C++ solution is obviously much faster than any R based solution, increasing the speed between 1900 and 400 times.
- The Julia based solution is almost as fast as the C++ solution, which is very impressive for a high-level interpreted language.
- The native R
sort
is almost 8 times faster than the fastest bubble sort in C++ and Julia, butsort
might use a different algorithm.
R (recursive implementation). No improvements.
larger = function(pair) { if(pair[1] > pair[2]) return(TRUE) else return(FALSE) } swap_if_larger = function(pair) { if(larger(pair)) { return(rev(pair)) } else { return(pair) } } swap_pass = function(vec) { for(i in seq(1, length(vec)-1)) { vec[i:(i+1)] = swap_if_larger(vec[i:(i+1)]) } return(vec) } bubble_sort_recursive = function(vec) { new_vec = swap_pass(vec) if(isTRUE(all.equal(vec, new_vec))) { return(new_vec) } else { return(bubble_sort(new_vec)) } }
R (for/while loop implementation). This implementation was previously not present.
bubble_sort = function(vec) { no_passes = 0 while(1) { no_swaps = 0 for (j in 1 : (length(vec) - 1 - no_passes)) { if (vec[j] > vec[j + 1]) { s = vec[j] vec[j] = vec[j+1] vec[j+1] = s no_swaps = no_swaps + 1 } } no_passes = no_passes + 1 if(no_swaps == 0) break } vec }
C++ (linked into R using Rcpp/inline). Precomputing vec.size()
outside the loop improved performance by a factor of 2.
require(inline) ## for cxxfunction() src = 'Rcpp::NumericVector vec = Rcpp::NumericVector(vec_in); double tmp = 0; int n = vec.size(); int no_swaps; int passes; passes = 0; while(true) { no_swaps = 0; for (int i = 0; i < n - 1 - passes; ++i) { if(vec[i] > vec[i+1]) { no_swaps++; tmp = vec[i]; vec[i] = vec[i+1]; vec[i+1] = tmp; }; }; if(no_swaps == 0) break; passes++; }; return(vec);' bubble_sort_cpp = cxxfunction(signature(vec_in = "numeric"), body=src, plugin="Rcpp")
Julia. Subtly changing the definition of the for loop (1:(length(vec_in) - 1 - passes)
vs [1:(length(vec_in) - 1 - passes)]
) improved performance two fold.
function swap_vector(vec_in, passes) no_swaps = 0 for vec_index in 1:(length(vec_in) - 1 - passes) if vec_in[vec_index] > vec_in[vec_index + 1] no_swaps += 1 tmp = vec_in[vec_index] vec_in[vec_index] = vec_in[vec_index + 1] vec_in[vec_index + 1] = tmp end end return no_swaps end function sort_julia(vec_in) passes = 0 while(true) no_swaps = swap_vector(vec_in, passes) if no_swaps == 0 break end passes += 1 end return vec_in end
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.