Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve been travelling for the past few days (for the R/Finance 2010 conference in Chicago), so I’d missed much of the reaction to AnnMaria De Mars’ article last week where she claimed that "R is an epic fail". Understandably, that inflammatory statement provoked many reactions from the R community on Twitter and in the blogosphere. (I suspect the fact that she was attending a SAS conference when writing the post only added fuel to the fire.) Yihui Xie was the first to bring attention to the article, and Drew Conway followed up with a detailed and well-reasoned response, and Tal Galili provides a great round-up of other responses along with his own commentary.
Now that I’m back at my desk and have had a chance to reflect on the post and its responses, it seems like the disconnect is less around what R does, but rather who uses it. According to De Mars, "The vast majority of people are NOT programmers. They are used to looking at things and clicking on things." And for the most part, I think that’s true: most of the people who use R are statisticians, and there are more non-statisticians than statisticians in the world. And R does have a steep learning curve today.
But I think the point that’s being missed here is that different communities have a different concept of what statistical analysis is. It’s almost a generational difference: if your statistical education was based around SAS or SPSS, it seems that your view of statistics is a very procedural one: for this kind of problem, fit that kind of model, and look at these results. That’s a worldview that’s easily accommodated by the procedural nature of SAS programming, or the regimented nature of point-and-click GUIs such as one finds in SPSS.
But there’s a newer — and, for the most part, younger — cohort of statisticians out there now, who have a different view of statistics. For them, statistics is less of a set of rigid procedures and more of a fluid process. A process where trying out different data transformations and models is encouraged. Where innovative visualizations of data are created. Where cross-discipline fertilization is commonplace, and models from (say) biostatistics are successfully applied to marketing data. I meet the members of this cohort mainly within industries where innovation is particularly valued: places like Web 2.0 and social-media companies; hedge funds; drug discovery; and marketing optimization at a greater rate than I do in "traditional" venues for statisticians. And most of them have been trained in R.
Can these two communities ever be bridged? I think it’s inevitable. The need for data analysts is only going to grow further, but given that the way we think about data analysis and statistics has changed, so the tools we use in the future are going to need to accommodate that change. R’s flexibility (the programming language) and innovation (the exponential growth in add-on packages) driven by the open-source community is what makes it attractive to those organizations where innovation is paramount. (And given competitive forces, innovation in data analysis will soon be paramount for most.) But it’s true: there will always be more non-programmers than programmers, and even they are going to need a software platform that supports a freer, less procedural style of data analysis. "Statistics 2.0", if you will.
So why can’t R be that platform? What if we could bring the innovation and flexibility of R, and make it available to non-programmers? I think it’s possible: in fact, making R accessible to more users (and therefore customers, natch) has been a focus area for REvolution since its inception. As it happens, I’m looking forward to sharing some exciting developments in an extensible and flexible user-interface for large-scale data analysis that we’ve been prototyping for R. R is a lot more than a programming language, and it will certainly become much more in the future. In fact, I’d second Joe Dunn in saying:
"I will not be surprised if in ten years R is the standard for statistical data analysis, much as Linux has supplanted commercial UNIX and gone on to explore territory that its predecessor never touched (look at Ubuntu). R may not be the next big thing, but R is certainly a big thing that is forthcoming."
AnnMaria’s blog: The Next Big Thing
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.