Site icon R-bloggers

R is for R Origin Story

[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
An important place in the history of statistics is AT&T Bell Laboratories. And one of the key parts of that story is the development of a language for statistical computing called S.

Prior to 1975 or so, statistical researchers at Bell Labs used Fortran for their statistical computing. But from 1975 to 1976, John Chambers, Rick Becker, and Allan Wilks developed a language written in Fortran that would allow for more interactive analysis. Though they threw around many names, including SCS (Statistical Computing System), they settled on S – this is, after all, the same institution that brought you the C programming language, so single letter names were kind of a thing. In 1988, they created New S, switching some of the internal functions from Fortran to C.

Out of S grew S-Plus, a commercial statistical computing language, which arrived on the scene also in 1988. And in 1993, an open source version S and S-Plus appeared: R. The syntax for New S, S-Plus, and R are very similar – in fact, much of the syntax will run on all three of these languages. But being open source, anyone who wants R can access it and its source code, under the GNU General Public License.

BTW, these are my GNU coding buddies – their names are Gus (left) and Gary (right), and yes, I can tell the difference between them. Gary has a wider face.
R is an interpreted language with features of object-oriented and functional programming. It, of course, can be used as a big calculator, but there are much more powerful things you can do with R. Personally, I haven’t even begun to scratch the surface of some of these capabilities – both on this blog and as an R user. For instance, I still struggle with some of the programming concepts like creating loops.

R is maintained by the R Development Core Team, and is a not-for-profit organization. Because of the terms of the GNU GPL, anyone who wants can access the source code, make changes, and even distribute their version – though because of the terms of the GPL, they also must make their altered source code available. The purpose is to put power back in the hands of the people using the software and language. And anyone who wants can develop R packages that extend the capabilities of R.

I’ve had colleagues criticize this open source nature of R, because of concerns about quality issues. Anyone can write an R package. But this is a collaborative project and anyone can access source code, meaning if there are issues, someone is very likely to find them and fix them (or weed out the bad). Not to mention – knock on wood – but I’ve never had any major issues with bugs in R, and yet I’ve purchased expensive proprietary software riddled with bugs. (Some of us may remember the SPSS version 16 release, which would often refuse to save or save then delete your data files. Sometimes you wouldn’t know it until you went to open your data, only to find the file missing. That was probably what made me finally dump SPSS back in 2009. And it’s also why, when I went back to using proprietary statistical software during my time at VA, I opted to learn Stata on my own than go back to SPSS. I use SPSS once or twice a week in my current job, but I’m trying to move us toward using R whenever possible.)

What are some things you should know about working with R?
Sound off, R users: what do you think people should know about working with R?

To leave a comment for the author, please follow the link and comment on their blog: Deeply Trivial.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.