Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
When you first learnt about data frames in R, I’m sure that, like me, you thought “This is a lot of hassle having to type the names of data frames over and over in order to access each column”.
library(MASS) anorexia$wtDiff <- anorexia$Postwt - anorexia$Prewt #I have to type anorexia how many times?
Indeed, any time you see chunks of code repeated over and over, there’s an indication that they need rewriting. Thus the first time you discovered the attach
function was a blissful moment. Ah, the hours you would save by not typing variable names! Alas, those hours were more than made up for by the hundreds of hours you spent debugging impenetrable buggy code that was a side effect of attach
.
attach(anorexia) anorexia$wtDiff <- Postwt - Prew #Deliberate typo! detach(anorexia)
In the above snippet of code, the typo causes execution to halt after the second line, so the call to detach
never happens. Then, after you’ve fixed the typo and run the code again, anorexia
is on your search path twice. This is problematic because when you detach it, there is still a copy of the data frame on the search path. Cue wailing and gnashing of teeth as you waste half an hour trying to find the bug.
Today we’re going to look at three functions that let you manipulate data frames, without the nasty side-effects of attach
– with
, within
and transform
.
For adding (or overwriting) a column to a data frame, like in the above example, any of the three functions is perfectly adequate; they just have slightly different syntaxes. with
often has the most concise formulation, though there isn’t much in it.
anorexia$wtDiff <- with(anorexia, Postwt - Prewt) anorexia <- within(anorexia, wtDiff2 <- Postwt - Prewt) anorexia <- transform(anorexia, wtDiff3 = Postwt - Prewt)
For multiple changes to the data frame, all three functions can still be used, but now the syntax for with
is more cumbersome. I tend to favour within
or transform
in these situations.
fahrenheit_to_celcius <- function(f) (f - 32) / 1.8 airquality[c("cTemp", "logOzone", "MonthName")] <- with(airquality, list( fahrenheit_to_celcius(Temp), log(Ozone), month.abb[Month] )) airquality <- within(airquality, { cTemp2 <- fahrenheit_to_celcius(Temp) logOzone2 <- log(Ozone) MonthName2 <- month.abb[Month] }) airquality <- transform(airquality, cTemp3 = fahrenheit_to_celcius(Temp), logOzone3 = log(Ozone), MonthName3 = month.abb[Month] )
The most important lesson to take away from this is that if you are manipulating data frames, then with
, within
and transform
can be used almost interchangeably, and all of them should be used in preference to attach
. For further refinement, I prefer with
for single updates to data frames, and within
or transform
for multiple updates.
Tagged: data-manipulation, data-transformation, r, with
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.