Site icon R-bloggers

the random variable that was always less than its mean…

[This article was first published on R – Xi'an's Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Although this is far from a paradox when realising why the phenomenon occurred, it took me a few lines to understand why the empirical average of a log-normal sample is apparently a biased estimator of its mean. And why the biased plug-in estimator does not appear to present a bias. The picture below compares two estimators of the mean of a log-normal LN(0,σ²) distribution when σ² increases: blue stands for the empirical mean, while gold corresponds to the plug-in estimator exp(σ²/2) when σ² is estimated from the log-sample. (The sample is of size 10⁶.)

The question came on X validated and my first reaction was to doubt the implementation which outcome was so counter-intuitive. But then I thought about the representation of a log-normal variate as exp(σξ) when ξ is a standard Normal variate. When σ grows large enough, it is near impossible for σξ to be larger than σ². More precisely,

P(X>E[X])=P(σξ>σ²/2)=1-Φ(σ/2)

which can be arbitrarily small.


Filed under: Books, Kids, R, Statistics Tagged: cross validated, empirical cdf, Gumbel distribution, R, skewed distribution, Stack Exchange

To leave a comment for the author, please follow the link and comment on their blog: R – Xi'an's Og.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.