Coding, GUIs and Statistical Rituals

[This article was first published on Confounded by Confounding » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I was recently inspired to comment on this blog post, asking is R is a cure for ‘mindless statistics’. Anyone whose familiar with statistics used in applied fields like epidemiology, sociology, social sciences generally will be familiar with the idea of a ‘statistical ritual’. Rather than think about the proper statistical approach to every question, the researcher somewhat mindlessly follows the formula/cookbook they learned in classes. They apply what they know mindlessly, which gives rise to phrases like “When all you have is a t-test, everything looks like a comparison of normally distributed samples”. More after the jump.

Only a little bit better is allowing the software to do it for you. JMP or SPSS for example, will “helpfully” decide what test to apply, based on what data you’ve told it to use. But its a guess, and it takes place in a black box. That’s…bad.

The blogger above asked if R is the cure for this kind of thing. First, points to him for turning one of R’s greatest weaknesses, a steep learning curve, into a strength. The core of the idea is that by specifying everything, you’ll have to think about what you’re asking the software to do. You’ll choose the proper task, because you must choose.

It’s a promising concept, but I think the answer is “no”. It might help, but it’s not a cure. I’ll talk about a language I use more, which could have the same claim: SAS.

Consider the following code:

proc phreg data=work.dataset;
model t*outcome(0)=treat covariate treat*covariate/risklimits ties=efron;
ods select modelinfo fitstatistics parameterestimates;
title2 “PH regression results interaction”;

Simple code for a Cox proportional hazards model with an outcome with censoring, treatment, a covariate, and an interaction between treatment and the covariate. Seems like I’ve done a great deal of thinking – I have to specify how ties in the data are handled, where the output is going, etc. But really, as long as I have this snippet saved, I can plug new variables and datasets in whenever I have a “new” problem I think is appropriate. And if you Googled this code up, you could too, without really knowing what “ties=efron” means. Or how many options provided in PROC PHREG where I used the defaults.

The same is true for R. You can Google a solution, without ever knowing all the arguments a function could take, and you’ll get results out. You just did what the nice man (or woman) on the website told you to. I see it all the time with users of STATA, a program that sits somewhere in the middle between something like JMP and SAS or R (no offense STATA users, I also know some very good STATA coders). It’s still mindless, still ritualized…just a little more work.


Filed under: Epidemiology, R, SAS, Soapbox

To leave a comment for the author, please follow the link and comment on their blog: Confounded by Confounding » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)