a brief on naked statistics

xi'an

9 years ago

[This article was first published on Xi'an's Og » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Over the last Sunday breakfast I went through Naked Statistics: Stripping the Dread from the Data. The first two pages managed to put me in a prejudiced mood for the rest of the book. To wit: the author starts with some math bashing (like, no one ever bothers to tell us about the uses of high school calculus!) either because he really feels like this or because it pays with the intended audience (like, we are on the same side, pal!), he then shows how he outsmarted his high school math teacher by spotting the exam was not possibly designed for his class and then another math teacher by just… re-inventing the steps leading to Zeno’s paradox (said Zeno of Elea not appearing in the credits of the book, to be sure) and sums it up with an NRA argument: “statistics is like a high-caliber weapon: helpful when used correctly” (p.xiv). Add to that a highly ethnocentric perspective that makes the book hardly readable for anyone outside the US, due to its absolute focus on all things American (exaggerating just a wee bit: who are Lebron James, Kim Kardashian, and Dan Rather?! what is Netflix?! why’s this Donald Rumsfeld guy quoted throughout the book?! how do they play baseball?! What do NBA, NHL, and SAT stand for?! &tc.)—as best illustrated by the facts that it took Charles Wheelan three months to realise a (golf) laser measuring instrument he had received could be in another unit that feet, namely meters!, and that he considers paying 100 rupees for a chai (मसाला चाय) in India a cheap price when this amount roughly corresponds to the average daily salary there…—. Top the whole thing with the fact that the author has already written a Naked Economics and seemingly found gold. (I am desperate for the incoming Naked Paleopathology tome in the series!) And there you get me stuck with such a highly negative a priori about Naked Statistics that I could not shake it off for the rest of the book.

“This book will not make you a statistical expert (…) This book is not a textbook.” (p.xv)

With this warning in mind about my bias, let’s get on with what’s in this book. The above tells us what isn’t. To quote further from the author, the book “has been designed to introduce the statistical concepts with the most relevance to everyday life“ (p.xv). Naked Statistics goes over the basic notions of statistics (mean, standard deviation, correlation, linear regression, testing, design, polling), gives a sprinkle of probability background (counting models and the central limit theorem, which Wheelan considers as part of statistics), and spend the remaining chapters warning the reader(s) about the possible missuses of models and statistical tools if implemented in the wrong situations or with the wrong type of data. (There are a few graphs, but they are not particularly inspiring.) All this done with the minimum amount of maths formulae, mostly hidden in footnotes and appendices. (But then why adding an extra formula for σ when one is given just before for σ²?!) Sometimes, the minimum is not enough, as demonstrated by the “formula for calculating the correlation coefficient” (p.61) which takes a whole page of text to get around this absurdity of not using maths symbols like Σ and concludes with the lame “I’ll wave my hands and let the computer do the work” (p.61)! Somehow surprisingly, given the low-key nature of the book, it includes a final appendix on statistical software. From Excel, to SAS, Stata, and …R! While I am pleased at this inclusion, it sounds very much orthogonal to the purpose and the intended audience of Naked Statistics. I cannot fathom anyone reading the book and then immediately embarking upon writing an R code without stopping by a statistics textbook or formal training. (Incidentally, the author reproduces the usual confusion between free and open source, p.259.)

“Lest you hurl the book across the room again, I have put the formula in an appendix.” (p.159)

In the chapter about possible misuses of probability models (and statistics), Naked Statistics predictably takes the example of the “most irresponsible use of statistics”, namely the role of inappropriate VAR models in the 2008 crisis. Somehow inevitably, Nassim Taleb’s The Black Swan: The Impact of the Highly Improbable slides in to dispense its widsom. (As an aside, in this chapter, I tried to make sense of the mindboggling sentence “My mother has had three holes in one” (p.99) and could not. Until I found on Google it was a golfing expression…) While the book abounds in reasonable examples on the misuse of statistics, I am not convinced this is the most relevant one, esp. because those unrealistic models were so very rarely based on any data.

“Regression analysis is the hydrogen bomb of the statistics arsenal.” (p.213)

One chapter that proved useful to me was Chapter 5: “Don’t buy the extended warranty on your $99 printer”. Indeed, on the same afternoon I read Naked Statistics, I went to buy an electrical appliance and could bring the book as an argument to refuse this extra-warranty the seller was really eager to let me “benefit from”, for a mere 1/6th of the cost..! (I also liked the trick of Schiltz beer to apparently induce [some] Bud drinkers into switching to this competitor brand.) On the other hand, while I could not spot any statistical blunder from my breakfast cursory read, I did not agree with Wheelan calling “Delma Kinney, a fifty-year-old Atlanta man [who] won $1 million in an instant lottery game in 2008 and another $1 million in an instant lottery game in 2011″ (p.9) a “statistical anomaly”. The probability of “q in 25 trillion” advanced right after is a typical example of the wrong type of conditioning: as explained in several posts on the ‘Og, such occurrences are bound to happen, sometime, somewhere, if most likely not to Mr Delma Kinney!

“The resulting performances will be closer to the mean.” (p.106)

In conclusion, while I do not see much specific appeal in Naked Statistics, I reckon this is one of many books pointing out the possible misuses of statistics to the general public and bringing some awareness as how to re-analyse and debunk (with the proper amount of training) statistics found on the news. In that respect, it does not differ so much (in spirit) from How to lie with Statistics, even though the current book has more current real-life examples. That may be used in the classroom if needed.

Filed under: Books, R, Statistics, University life Tagged: book review, general public, How to Lie with Statistics, India, introductory textbooks, masala chai, Naked Economics, Naked Statistics, Zen, Zeno’s paradox

To leave a comment for the author, please follow the link and comment on their blog: Xi'an's Og » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.