R is the easiest language to speak badly

[This article was first published on mages' blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I am amazed by the number of comments I received on my recent blog entry about “by”, “apply” and friends. I had started my post by pointing out that R is a language. Well indeed, I have come to the conclusion, that it is a language with lots of irregular expressions and dialects. It feels a bit like German or French where you have to learn and memorise the different articles. The Germans have three singular definite articles: der (male), die (female) and das (neutral), the French have two: le (male) and la (female). Of course there is no mapping between them, and how do you explain that a girl in German is neutral (das Mädchen), while manhood is female (die Männlichkeit)?

Back to R. As I found out, there are lots of different ways to calculate the means on subsets of data. I begin to wonder, why so many different interfaces and functions have been developed over the years, and also why I didn’t use the aggregate function more often in the past?

Can we blame internet search engines? Why should I lean a programming language properly, when I can find approximate answers to my problem online. I may not end up with the best answer, but with something which will work after all: Don’t know why, but it works.

And sometimes the help files can be more difficult to understand than the code in the examples. Hence, I end up playing around with the example code until it works, and only then I try to figure out how it works. That was my experience with reshape.

Maybe this is a bit harsh. It is always up to the individual to improve his language skills, but you can get drunk in a pub as well, by only being able to order beer. I think it was George Bernard Shaw, who said: “R is the easiest language to speak badly.” No, actually he said: “English is the easiest language to speak badly.” Maybe that explains the success of English and R?

Reading helps. More and more books have been published on R over the last years, and not only in English. But which should you pick? Xi’an’s review on the Art of R Programming suggests that it might be a good start.

Back to aggregate. Has anyone noticed, that the formula interface of aggregate is different to summaryBy?

aggregate(cbind(Sepal.Width, Petal.Width) ~ Species, data=iris, FUN=mean) Species Sepal.Width Petal.Width 1 setosa 3.428 0.246 2 versicolor 2.770 1.326 3 virginica 2.974 2.026
versus
summaryBy(Sepal.Width + Petal.Width ~ Species, data=iris, FUN=mean) Species Sepal.Width.mean Petal.Width.mean 1 setosa 3.428 0.246 2 versicolor 2.770 1.326 3 virginica 2.974 2.026
And another slightly more complex example:
aggregate(cbind(ncases, ncontrols) ~ alcgp + tobgp, data = esoph, FUN=sum) summaryBy(ncases + ncontrols ~ alcgp + tobgp, data = esoph, FUN=sum)

To leave a comment for the author, please follow the link and comment on their blog: mages' blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)