The importance of being unoriginal (and befriending google)
[This article was first published on Life in Code, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In search of bin counts
I look at histograms and density functions of my data in R on a regular basis. I have some idea of the algorithms behind these, but I’ve never had any reason to go under the hood until now. Lately, I’ve been looking using the bin counts for things like Shannon entropy ( in the very nice entropy package. I figured that binning and counting data would either be supported via a native, dedicated R package, or quite simple to code. Not finding the former (myhist = function(x, dig=3) { x=trunc(x, digits=dig); ## x=round(x, digits=dig); aa = bb = seq(0,1,1/10^dig); for (ii in 1:length(aa)) { aa[ii] = sum(x==aa[ii]) }; return(cbind(bin=bb, dens=aa/length(x))) } ## random variates test = sort(runif(1e4)) get1 = myhist(test)
Trouble in paradise
Truncate the data to a specified precision, and count how many are in each bin. Well, first I triedDear Google…
An hour of irritation and confusion later, I ask google and, small wonder, the second search result links to the ash package that contains said tool. And it runs somewhere between 100 and 1,000 times faster. It doesn’t return the bin boundaries by default, but it’s good enough for a quick-and-dirty empirical probability mass distribution.To be fair, there’s something to be said for cooking up a simple solution to a simple problem, and then realizing that, for one reason or another, the problem isn’t quite as simple as one first thought. On the other hand, sometimes we just want answers. When that’s the case, asking google is a pretty good bet.
## their method require(ash) get2 = bin1(test, c(0,1), 1e3+1)$nc
To leave a comment for the author, please follow the link and comment on their blog: Life in Code.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.