Site icon R-bloggers

hash-2.0.0

[This article was first published on Open Data Group » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The hash-2.0.0 package has been uploaded to CRAN.  This version was developed in conjunction with R-2.11.0 and was refactored for performance.   hash-2.0.0 requires R-2.10.0 or later and will not be supported on earlier versions of R.  This is a result of recent changes to the language itself.

Importantly: Understand that hash-2.0.0, breaks backward compatibility; code written with previous versions of the hash package are not guaranteed to work with this or future versions. This is due to changes made in order to achieve much higher performance.  Assignments and look-ups are achieved more quickly through direct inheritance of environments, stripping of non-essential customizations and reliance on core and primitive functions.

Here is a summary of major changes:

ChangeLog and TODO track many technical details; here I will discuss only the more important changes:

Performance

Included in this version is a demo script that runs benchmarks (demo(hash-benchmarks).  One of the questions that has been repeatedly posed, often in the context of look-up, is:  how does this compare to native R named lists and vectors? In other words, how much quicker is accessing a value on a hash / environment as opposed to a list (or vector)?  This is a difficult questions, and generally depends on the size of the hash or list.  My rule of thumb is that it is quicker to look-up elements on lists and vectors less than about 500 elements.  After ~500 elements, hashes and environments greatly outperform lists.  The difference increases relative to the size of the object.  However, look-ups for all these objects are very fast if objects are small  ( >120,000 / sec ).  So unless you are doing many serial look-ups, hashes are likely the better option.

I have written previously about hashes in R [1] [2], and will continue to  discuss the evolution of R hashes on this blog.  Additionally I will be speaking on this and related work at useR!2010 (July 20-23.)

To leave a comment for the author, please follow the link and comment on their blog: Open Data Group » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.