Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Benford’s law is nowadays extremely popular (see e.g. http://en.wikipedia.org/…). It is usually claimed that, for a given set data set, changing units does not affect the distribution of the first digit. Thus, it should be related to scale invariant distributions. Heuristically, scale (or unit) invariance means that the density of the measure
Now if
> (benford=log(1+1/(1:9))/log(10)) [1] 0.30103000 0.17609126 0.12493874 0.09691001 0.07918125 [6] 0.06694679 0.05799195 0.05115252 0.04575749 > names(benford)=1:9 > sum(benford) [1] 1 > barplot(benford,col="white",ylim=c(-.045,.3)) > abline(h=0)
To compute the empirical distribution from a sample, use the following function
> firstdigit=function(x){ + if(x>=1){x=as.numeric(substr(as.character(x),1,1)); zero=FALSE} + if(x<1){zero=TRUE} + while(zero==TRUE){ + x=x*10; zero=FALSE + if(trunc(x)==0){zero=TRUE} + } + return(trunc(x)) + }
and then
> Xd=sapply(X,firstdigit) > table(Xd)/1000
In Benford’s Law: An Empirical Investigation and a Novel Explanation, we can read
It is not a mathematical article, so do not expect any formal proof in this paper. At least, we can run monte carlo simulation, and see what’s going on if we generate samples from a lognormal distribution with variance
> set.seed(1) > s=1 > X=rlnorm(n=1000,0,s) > Xd=sapply(X,firstdigit) > table(Xd)/1000 Xd 1 2 3 4 5 6 7 8 9 0.288 0.172 0.121 0.086 0.075 0.072 0.073 0.053 0.060 > T=rbind(benford,-table(Xd)/1000) > barplot(T,col=c("red","white"),ylim=c(-.045,.3)) > abline(h=0)
Clearly, it not far away from Benford’s law. Perhaps a more formal test can be considered, for instance Pearson’s
> chisq.test(T,p=benford) Chi-squared test for given probabilities data: T X-squared = 10.9976, df = 8, p-value = 0.2018
So yes, Benford’s law is admissible ! Now, if we consider the case where
compared with the case where
It is possible to generate several samples (always the same size, here 1,000 observations), just change the variance parameter
> T=table(Xd) > T=T[as.character(1:9)] > T[is.na(T)]=0 > PVAL[i]=chisq.test(T,p=benford)$p.value
Boxplots of the
When
Note that it is also possible to compute the
> ks.test(PVAL[,s], "punif")$p.value
Indeed, if
Arthur Charpentier
Arthur Charpentier, professor in Montréal, in Actuarial Science. Former professor-assistant at ENSAE Paristech, associate professor at Ecole Polytechnique and assistant professor in Economics at Université de Rennes 1. Graduated from ENSAE, Master in Mathematical Economics (Paris Dauphine), PhD in Mathematics (KU Leuven), and Fellow of the French Institute of Actuaries.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.